Replica method and random matrices (I)

$\def\blambda{\boldsymbol \lambda}$ $\def\bX{\boldsymbol X}$ $\def\N{\mathbb N}$ $\def\P{\mathbb P}$ $\def\bi{\boldsymbol i}$ $\def\bG{\boldsymbol G}$ $\def\bsigma{\boldsymbol \sigma}$ $\def\bv{\boldsymbol v}$ $\def\bu{\boldsymbol u}$ $\def\sT{\mathsf T}$ $\def\bW{\boldsymbol W}$ $\def\bA{\boldsymbol A}$ $\def\R{\mathbb R}$ $\def\S{\mathbb S}$ $\def\GOE{\text{GOE}}$ $\def\|{\Vert}$ $\def\bx{\boldsymbol x}$ $\def\cN{\mathcal N}$ $\def\E{\mathbb E}$ $\def\de{\text{d}}$ $\def\vphi{\varphi}$ $\def\bQ{\boldsymbol Q}$ $\def\diag{\text{diag}}$ $\def\bzero{\boldsymbol 0}$ $\def\id{\mathbf I}$ $\def\ones{\mathbf 1}$ $\def\ext{\text{ext}}$ $\def\|{\Vert}$ $\def\bLambda{\boldsymbol \Lambda}$ $\def\const{\text{const}}$ $\def\Unif{\text{Unif}}$ $\def\bSigma{\boldsymbol \Sigma}$

1. Introduction

In my first several posts, I will discuss about the replica method. The replica method is a non-rigorous but powerful approach that originated within the statistical physics literature. It has been widely applied in other fields, including coding theory and high dimensional statistics. I used this method several times in my research. It helped me quickly figure out the answer of numerous problems.

Most physicists know the replica method better than me. Here I would like to introduce the replica method to theorists outside the physics community. I will not assume knowledges of statistical physics from the audience, but I will occasionally use terminology from physics. For a more sophisticated introduction to the replica method, I recommend Chapter 8 of this fantastic book [1].

In this post, I will discuss how to use the replica method to calculate the spectral norm of certain random matrices. In the next few posts, I will discuss how to use it to calculate the spectral density (Stieltjes transforms) of certain random matrices.

2. A motivating example: spiked GOE matrix

We consider a symmetric random matrix $\bA \in \R^{n \times n}$, \[ \bA= \lambda \bu \bu^\sT + \bW, \] where $\lambda \ge 0$ is the signal to noise ratio, $\bu \in \S^{n-1} \equiv \{ \bx \in \R^n: \| \bx \|_2 = 1 \}$ is a spike vector, and $\bW \sim \GOE(n)$: meaning that $\bW \in \R^{n \times n}$ is a symmetric matrix, $W_{ij} \sim \cN(0, 1/n)$ for $1 \le i < j \le n$, and $W_{ii} \sim \cN(0, 2/n)$ for $1 \le i \le n$.

We are interested in the largest eigenvalue and the corresponding eigenvector of $\bA$, which are respectively denoted by $\lambda_{\max}(\bA)$ and $\bv_{\max}(\bA)$. The result below is called the BBP phase transition phenomenon (see [2] for spiked Wishart matrix and [3] for spiked Wigner matrix).

The largest eigenvalue $\lambda_{\max}(\bA)$:
For $0 \le \lambda \le 1$, we have \[ \lim_{n \to \infty} \E[\lambda_{\max} ({\bA})] = 2; \] For $\lambda > 1$, we have \[ \lim_{n \to \infty} \E[\lambda_{\max} ({\bA})] = \lambda + 1/\lambda. \]
The correlation of top eigenvector $\bv_{\max}(\bA)$ with $\bu$:
For $0 \le \lambda \le 1$, we have \[ \lim_{n \to \infty} \E[\langle \bv_{\max}(\bA), \bu \rangle^2] = 0; \] For $\lambda > 1$, we have \[ \lim_{n \to \infty} \mathbb{E}[\langle {\boldsymbol v}_{\max}({\boldsymbol A}), {\boldsymbol u}\rangle^2] = 1 - 1/\lambda^2. \]

In the rest of this blog post, we will derive this result using the replica method.

3. Tricks in statistical physics

3.1. The free energy trick

Let $H: \Omega \to \R$ and $f: \Omega \to \R$ be two functions. We would like to calculate the following quantities \[ \begin{aligned} &\max_{\bsigma \in \Omega} H(\bsigma),
&f(\arg \max_{\bsigma} H(\bsigma)). \end{aligned} \]

The free energy trick works as follows. Define a Gibbs measure \[ \mu_{\beta, h}(\de \bsigma) = \frac{1}{Z(\beta, h)} \exp\{ \beta H(\bsigma) + h f(\bsigma)\} \nu_0(\de \bsigma), \] where $Z(\beta, h)$ denotes the normalizing constant \[ \begin{align}\tag{Def Z} \label{eqn:def_Z} Z(\beta, h) = \int_{\Omega} \exp\{ \beta H(\bsigma) + h f(\bsigma)\} \nu_0(\de \bsigma), \end{align} \] and $\nu_0$ denotes a reference measure on $\Omega$. In physics, $\beta$ stands for the inverse temperature, and $h$ stands for the strength of external field. Define the free energy function by \[ \Psi(\beta, h) = \log Z(\beta, h). \] The following lemma shows that, the free energy function $\Psi$ can generate many interesting quantities, including $\max_{\bsigma \in \Omega} H(\bsigma)$ and $f(\arg \max_{\bsigma} H(\bsigma))$.

Denote the ensemble average operator $\langle \cdot \rangle$ by \[ \langle g \rangle_{\beta, h} = \int_\Omega g(\bsigma) \mu_{\beta, h}(\de \bsigma). \] Then we have \begin{align}\tag{1}\label{eq:1} \partial_\beta \Psi(\beta, h) =& \langle H \rangle_{\beta, h},\\ \partial_h \Psi(\beta, h) =& \langle f \rangle_{\beta, h},\\ \end{align} and \begin{align}\tag{2}\label{eq:2} \lim_{\beta \to \infty} \partial_\beta \Psi(\beta, h) =& \max_{\bsigma \in \Omega} H(\bsigma),\\ \lim_{\beta \to \infty} \partial_h \Psi(\beta, h) =& f(\arg\max_{\bsigma \in \Omega} H(\bsigma) ). \end{align}

Equation \eqref{eq:1} follows from basic calculus of exponential family. The intuition for Equation \eqref{eq:2} gives the following: for $h = 0$ and large $\beta$, the measure $\mu_{\beta, 0}$ concentrates at the maxima of $H$.

In the spiked GOE model, the Hamiltonian associated with the random matrix $\bA$ is defined as \[ H_{n, \lambda}(\bsigma) = \langle \bsigma, \bA \bsigma \rangle = \lambda \langle \bu, \bsigma\rangle^2 + \langle \bsigma, \bW \bsigma\rangle. \] Denote the (random) partition function associated with the Hamiltonian $H_{n, \lambda}$ by \[ Z(\beta, \lambda, n) = \int_{\S^{n-1}} \exp\{ \beta n H_{n, \lambda}( \bsigma ) \} \nu_0(\de \bsigma). \] Here $\nu_0$ is the uniform probability measure on $\S^{n-1}$, and the $n$ factor in the exponent serves for proper normalization.

Define \[ \begin{align}\tag{3}\label{eq:3} \vphi_n(\lambda) =& \lim_{\beta \to \infty} \frac{1}{\beta n}\E[\log Z(\beta, \lambda, n)],\
\vphi(\lambda) =& \lim_{n\to \infty} \vphi_n(\lambda) = \lim_{n \to \infty} \lim_{\beta \to \infty} \frac{1}{\beta n}\E[\log Z(\beta, \lambda, n)]. \end{align} \] The function $\vphi(\lambda)$ describes the asymptotics of $\E[\lambda_{\max}(\bA)]$ and $\E[\langle \bv_{\max}(\bA), \bu\rangle^2]$, as shown in the lemma below.

For $\bA = \lambda \bu \bu^\sT + \bW \in \R^{n \times n}$ , we have \[ \begin{aligned} \vphi_n(\lambda) =& \E[\sup_{\bsigma} H_{n, \lambda}(\bsigma)]= \E[\lambda_{\max} (\bA)],\\ \vphi_n'(\lambda) =& \E[\langle \bv_{\max}(\bA), \bu \rangle^2]. \end{aligned} \]

We leave the proof of this lemma to readers.

3.2. The replica trick

The difficulty of calculating $\vphi(\lambda)$ originated from the expectation of $\log Z$ as in Eq. \eqref{eq:3}. The replica trick can convert the problem of calculating the expectation of $\log Z$ into calculating the moments of $Z$. Calculating the moments of $Z$ is an easier problem.

In the language of mathematics, the replica trick gives the following formula \[\tag{4} \label{eq:4} \E[\log Z] = \lim_{k \to 0} \frac{1}{k}\log \E[Z^k ]. \] Here is a simple proof of this equality: \[ \begin{aligned} \E[\log Z] =& \E[(\log Z^k) / k] = \lim_{k \to 0} \E [ \log(1 + Z^k - 1) / k] = \lim_{k \to 0} \E [ ( Z^k - 1 ) / k ] \
=&\lim_{k \to 0} (\E [ Z^k] - 1)/k = \lim_{k \to 0} \frac{1}{k} \log (1 + \E[Z^k] - 1) = \lim_{k \to 0} \frac{1}{k}\log \E[Z^k ]. \end{aligned} \]

Using the definition of $\vphi$ in Eq. \eqref{eq:3} and the replica trick in Eq. \eqref{eq:4}, we can rewrite $\vphi$ as \[ \begin{aligned} \vphi(\lambda) \equiv& \lim_{n\to \infty} \lim_{\beta \to \infty} \frac{1}{\beta n}\E[\log Z(\beta, \lambda, n)] = \lim_{n \to \infty} \lim_{\beta \to \infty} \lim_{k \to 0}\frac{1}{\beta k n}\log \E[Z(\beta, \lambda, n)^k]. \
\end{aligned} \] With some heuristic change of limit (which we will not justify), we get \[ \vphi(\lambda) \stackrel{\cdot}{=} \lim_{\beta \to \infty} \lim_{k \to 0} \lim_{n \to \infty} \frac{1}{\beta k n}\log \E[Z(\beta, \lambda, n)^k]. \] There are three limits in the right hand side of the above equation. Define \[ \begin{align}\tag{Def S} \label{eqn:defS} S(k, \beta, \lambda) = \lim_{n \to \infty} \frac{1}{n}\log \E[Z(\beta, \lambda, n)^k], \end{align} \] and \[ \begin{align}\tag{Def $\Psi$} \label{eqn:defPsi} \Psi(\beta, \lambda) = \lim_{k \to 0} \frac{1}{k} S(k, \beta, \lambda). \end{align} \] This gives \[ \begin{align}\tag{Def $\vphi$} \label{eqn:defvphi} \vphi(\lambda) = \lim_{\beta \to \infty} \frac{1}{\beta}\Psi(\beta, \lambda). \end{align} \] In the following section, we calculate $S$, $\Psi$, and $\vphi$ functions sequentially.

4. The replica calculations

4.1. The large $n$ limit: moments calculation

4.1.1. A variational formula for $S$ function

Recall the definition of $S$ function given in Eq. \eqref{eqn:defS}. We claim the following equality for $k \in \N_+$ \begin{align}\tag{5}\label{eqn:S_function} S(k, \beta, \lambda) \equiv \lim_{n \to \infty} \frac{1}{n}\log\E[Z(\beta, \lambda, n)^k]= \sup_{\bQ \in \R^{(k+1) \times (k+1)}, \diag(\bQ) = 1, \bQ \succeq \bzero} {U(\bQ) }, \end{align} where \begin{align}\tag{6}\label{eqn:U_function} U(\bQ) = \beta \lambda \sum_{i=1}^k q_{0i}^2 + \beta^2 \sum_{ij = 1}^k q_{ij}^2 + \frac{1}{2} \log(\det(\bQ)), \end{align} and $\bQ \in \R^{(k + 1) \times (k + 1)}$ is symmetric, with (the index of $\bQ$ starts from $0$ and ends at $k$) \[ \begin{align}\tag{Def $\bQ$} \label{eqn:def_Q} \bQ = \begin{bmatrix} 1& q_{01} & q_{02} & \ldots & q_{0k}\
q_{01} & 1 & q_{12} & \ldots & q_{1k}\
q_{02} & q_{12} & 1 & \ldots & q_{2k} \
\ldots & \ldots & \ldots & \ldots & \ldots\
q_{0k} & q_{1k} & q_{2k}& \ldots & 1\
\end{bmatrix}. \end{align} \] Equation \eqref{eqn:S_function} is exact and rigorous, but the computation is involved. By calculating the moments of $Z$ directly (the calculation is given in Section 4.1.2), we get \begin{align}\tag{7} \label{eqn:moments_result} \E[Z(\beta, \lambda, n)^k] = \int \exp\{ n U(\bQ) + o(n)\} \de \bQ. \end{align} Eq. \eqref{eqn:S_function} follows from Eq. \eqref{eqn:moments_result} and Laplace method. We suggest the readers skipping the derivation of Eq. \eqref{eqn:moments_result} at the first time reading, and continue to read Section 4.2.

4.1.2. The replica calculations

In this section, we derive Eq. \eqref{eqn:moments_result}. At the first time reading, we suggest the readers skipping this section and continue to read Section 4.2.

Recall the definition of $Z$ given in Eq. \eqref{eqn:def_Z}. The $k$‘th moment of $Z(\beta, \lambda, n)$ gives \[ \begin{aligned} \E[Z(\beta, \lambda, n)^k] = & \E\Big[\Big(\int_{\S^{n-1}} \exp\{ \beta n H_{n, \lambda}( \bsigma ) \} \nu_0(\de \bsigma)\Big)^k \Big].
\end{aligned} \] Here, the expectation $\E$ is taken with respect to the randomness of matrix $\bA$ in $H_{n, \lambda}(\bsigma) = \langle \bsigma, \bA \bsigma\rangle$. The first trick is to rewrite the $k$‘th power of integration into integration with respect to $k$ replicas of $\bsigma$. In this way, we can exchange the expectation with integrals. \[ \begin{aligned} \E[Z(\beta, \lambda, n)^k] =&\E\Big[\Big(\int_{\S^{n-1}} \exp\{ \beta n H_{n, \lambda}( \bsigma ) \} \nu_0(\de \bsigma)\Big)^k \Big] = \E\Big[\int_{(\S^{n-1})^k} \exp\Big\{ \beta n \sum_{i = 1}^k H_{n, \lambda}( \bsigma_i ) \Big\} \prod_{i \in [k]} \nu_0(\de \bsigma_i) \Big]\
=&\int_{(\S^{n-1})^k} \E\Big[ \exp\Big\{ \beta n \sum_{i = 1}^k [\lambda \langle \bu, \bsigma_i\rangle^2 + \langle \bsigma_i, \bW \bsigma_i\rangle ] \Big\} \Big] \prod_{i \in [k]} \nu_0( \de \bsigma_i). \end{aligned} \] Let $\bG = (G_{ij})_{i, j \in [n]} \in \R^{n \times n}$ with $G_{ij} \sim_{iid} \cN(0, 1)$. Noting that the GOE matrix $\bW$ shares the same distribution with matrix $(\bG + \bG^\sT) / \sqrt{2 n}$, we have \[ \begin{aligned} \E[Z(\beta, \lambda, n)^k] =& \int_{(\S^{n-1})^k} \E_{G_{ij} \sim \cN(0,1)}\Big[ \exp\Big\{ \beta n \lambda \sum_{i = 1}^k \langle \bu, \bsigma_i\rangle^2 + \beta n \Big\langle (\bG + \bG^\sT) / \sqrt{2n}, \sum_{i=1}^k \bsigma_i\bsigma_i^\sT \Big\rangle \Big\} \Big] \prod_{i \in [k]} \nu_0(\de \bsigma_i) \
=& \int_{(\S^{n-1})^k} \exp\Big\{ \beta n \lambda \sum_{i = 1}^k \langle \bu, \bsigma_i\rangle^2 \Big\} \times \E_{G_{ij} \sim \cN(0,1)}\Big[ \exp\Big\{ \beta \sqrt{2 n} \Big\langle \bG, \sum_{i=1}^k \bsigma_i\bsigma_i^\sT \Big\rangle \Big\} \Big] \prod_{i \in [k]} \nu_0(\de \bsigma_i). \end{aligned} \] Using the formula of Gaussian moment generating function $\E_{G \sim \cN(0, 1)}[e^{tG}] = e^{t^2/2}$, we get \[ \begin{aligned} \E[Z(\beta, \lambda, n)^k] =& \int_{(\S^{n-1})^k} \exp\Big\{ \beta n \lambda \sum_{i = 1}^k \langle \bu, \bsigma_i \rangle^2 + n \beta^2 \Big \Vert \sum_{i=1}^k \bsigma_i \bsigma_i^\sT \Big\Vert_F^2 \Big\} \prod_{i \in [k]} \nu_0(\de \bsigma_i) \
=& \int_{(\S^{n-1})^k} \exp\Big\{ \beta n \lambda \sum_{i = 1}^k \langle \bu, \bsigma_i \rangle^2 + n \beta^2 \sum_{i, j=1}^k\Big\langle \bsigma_i\bsigma_i^\sT, \bsigma_j \bsigma_j^\sT\Big \rangle \Big\} \prod_{i \in [k]} \nu_0(\de \bsigma_i) \
=& \int_{(\S^{n-1})^k} \exp\Big\{ \beta n \lambda \sum_{i = 1}^k \langle \bu, \bsigma_i \rangle^2 + n \beta^2 \sum_{i, j=1}^k \langle \bsigma_i, \bsigma_j \rangle^2 \Big\} \prod_{i \in [k]} \nu_0(\de \bsigma_i). \end{aligned} \]

In the following, we make use a heuristic argument (using Dirac delta function). Note we have \[ 1 = \int \prod_{i=1}^k \delta \Big( \langle \bu,\bsigma_i\rangle - q_{0i} \Big) \prod_{1 \le i < j\le k} \delta\Big( \langle\bsigma_i, \bsigma_j\rangle - q_{ij} \Big) \de \bQ \] where $\de \bQ \equiv \prod_{i \in [k]} \de q_{0i} \prod_{1 \le i < j \le k} \de q_{ij}$. Using this equality, we get \[ \begin{aligned} \E[Z(\beta, \lambda, n)^k] =& \int \de \bQ \exp\Big\{\beta n \lambda \sum_{i = 1}^k q_{0i}^2 + n \beta^2 \sum_{i, j=1}^k q_{ij}^2 \Big\} \cdot \int_{(\R^n)^k} \prod_{i=1}^k \delta \Big( \langle \bu,\bsigma_i\rangle - q_{0i} \Big) \prod_{1 \le i < j\le k} \delta\Big( \langle\bsigma_i, \bsigma_j\rangle - q_{ij} \Big) \prod_{i \in [k]} \nu_0(\de \bsigma_i) \
=& \int \de \bQ \exp\Big\{\beta n \lambda \sum_{i = 1}^k q_{0i}^2 + n \beta^2 \sum_{i, j=1}^k q_{ij}^2 + n H_n(\bQ) \Big\},
\end{aligned} \] where $\bQ \in \R^{(k + 1) \times (k + 1)}$ is given by Eq. \eqref{eqn:def_Q} and \[ \begin{aligned} H_n(\bQ) =& \frac{1}{n} \log \int_{(\R^n)^k} \prod_{i=1}^k \delta \Big( \langle \bu,\bsigma_i\rangle - q_{0i} \Big) \prod_{1 \le i < j\le k} \delta\Big( \langle\bsigma_i, \bsigma_j\rangle - q_{ij} \Big) \prod_{i \in [k]} \nu_0(\de \bsigma_i) \
=& \frac{1}{n} \log \int_{(\R^n)^{k+1}} \prod_{0 \le i < j\le k} \delta\Big( \langle\bsigma_i, \bsigma_j\rangle - q_{ij} \Big) \prod_{0 \le i \le k} \nu_0( \bsigma_i), \end{aligned} \] which serves as the rate function of the random matrix $\bSigma = (\langle \bsigma_i, \bsigma_j \rangle)_{0 \le i, j \le k} \in \R^{(k+1) \times (k+1)}$ when $(\bsigma_i)_{0 \le i \le k} \sim_{iid} \Unif(\S^{n-1})$. This rate function can be found in standard textbooks \[ \lim_{n \to \infty} H_n(\bQ) = \frac{1}{2}\log \det(\bQ). \] Here we give a heuristic argument to calculate the entropy term $H_n(\bQ)$, (here the argument is handwaving, but can be made rigorous)

\[ \begin{aligned} H_n(\bQ) \stackrel{\cdot}{=}& \frac{1}{n} \log \P_{(\bsigma_i)_{i \in [k]} \sim \Unif(\S^{n-1}(\sqrt{n}))} \Big( \big(\langle \bsigma_i, \bsigma_j \rangle / n \big)_{i, j \in [k]}\approx \bQ \Big)\\\ \stackrel{\cdot}{=}& \inf_{\lambda_{ij}} \frac{1}{n} \log \int_{(\R^n)^{k+1}} \prod_{0 \le i \le j\le k} \exp\Big\{ - \lambda_{ij}\langle\bsigma_i, \bsigma_j\rangle / 2 + n q_{ij} \lambda_{ij} / 2 \Big\} \prod_{0 \le i \le k} \de \bsigma_i + \const\\\ =& \inf_{\lambda_{ij}} \log \int_{\R^{k+1}} \prod_{0 \le i \le j\le k} \exp\Big\{ - \lambda_{ij} \sigma_i\sigma_j / 2 + q_{ij} \lambda_{ij}/2 \Big\} \prod_{0 \le i \le k} \de \sigma_i + \const\\\ =& \inf_{\bLambda = (\lambda_{ij})_{0 \le i \le j \le k}} \Big[ \langle \bQ, \bLambda\rangle / 2 - \log(\det(\bLambda))/2 \Big] + \const\\\ \stackrel{\cdot}{=}& \frac{1}{2} \log \det(\bQ), \end{aligned} \]

where we use the fact that, for $\bX_i \sim_{iid} \bX$, we have \[ \lim_{n \to \infty} \frac{1}{n} \log \P_{\bX_i}\Big( \frac{1}{n} \sum_{i=1}^n \bX_i \approx \bx \Big) = \inf_{\blambda} \log \E_\bX \Big[\exp\{ \langle\blambda, \bX - \bx\rangle \} \Big]. \] This gives Eq. \eqref{eqn:moments_result}.

4.2. The small $k$ limit: replica symmetric ansatz

4.2.1. Replica symmetric ansatz

Next we calculate (recall Eq. \eqref{eqn:defPsi}) \[ \Psi(\beta, \lambda) = \lim_{k \to 0} S(\beta, \lambda, k)/k. \] We gave the expression of $S$ when $k$ is integral (c.f. Eq. \eqref{eqn:S_function}), where $k$ serves as the dimension of variable $\bQ$ in the variational formula. However, to take the limit $k \to 0$, we need the expression for $S$ when $k$ takes real numbers. This is the difficulty to calculate the small $k$ limit using Eq. \eqref{eqn:S_function}.

The physicists’ trick is to plug in an ansatz for the matrix $\bQ$ to simplify the expression of $S$. Note that $U(\bQ)$ defined in Eq. \eqref{eqn:U_function} has some symmetry: if we permute the $1$‘st to $k$‘th rows of $\bQ$, and perform the same permutation to its columns, the function $U$ stays the same. This motivates us to consider the “replica symmetric ansatz”: \[ \bQ = \begin{bmatrix} 1& \mu & \mu & \ldots & \mu\
\mu & 1 & q& \ldots & q\
\mu & q & 1 & \ldots & q \
\ldots & \ldots &\ldots & \ldots & \ldots\
\mu & q & q & \ldots & 1\
\end{bmatrix}, \] or equivalently \[ q_{0i} = \mu, ~~ \forall i \in [k], ~~~~~~~~~~~~~~ q_{ij} = q, ~~ \forall 1 \le i < j \le k. \] Plugging this ansats into Eq. \eqref{eqn:U_function}, and simplifying the log determinant term using the matrix determinant lemma \[ \begin{aligned} &\log(\det(\bQ)) = \log \det( (1-q) \id_k + (q - \mu^2) \ones \ones^\sT) \
=& \log \det( \id_k + [(q - \mu^2)/(1-q) ] \ones \ones^\sT) + k \log(1-q)\
=& \log(1 + k [(q - \mu^2)/(1-q) ]) + k \log(1-q)\
=& \log\Big( 1 - \frac{\mu^2 k}{1 + (k - 1)q} \Big) + \log \Big(1 + \frac{kq}{1-q} \Big) + k \log(1 - q), \end{aligned} \] we get \[ U(\bQ) = \beta \lambda k \mu^2 + \beta^2 k + \beta^2 k (k-1) q^2 + \frac{1}{2} \Big[ \log\Big( 1 - \frac{\mu^2 k}{1 + (k - 1)q} \Big) + \log \Big(1 + \frac{kq}{1-q} \Big) + k \log(1 - q) \Big] \equiv U(\mu, q, k). \]

Assuming that the maximum of $U$ is taken at a “replica symmetric” form of $\bQ$, we get \[ S(\beta, \lambda, k) \stackrel{\cdot}{=} \sup_{\mu, q} U(\mu, q, k). \]

4.2.2. The small $k$ limit

Using the replica symmetric ansatz, we gave an expression for $S$ for any $k > 0$. Next we would like to calculate the $k \to 0$ limit (recall the definition of $\Psi$ given in Eq. \eqref{eqn:defPsi}) \[ \Psi(\beta, \lambda) = \lim_{k \to 0} S(\beta, \lambda, k) / k = \lim_{k \to 0} \sup_{\mu, q} \frac{1}{k} U(\mu, q, k). \] Define \[ u(\mu, q) \equiv \lim_{k \to 0} \frac{1}{k} U(\mu, q, k) = \beta \lambda \mu^2 + \beta^2 (1 - q^2) - \frac{\mu^2}{2(1 - q)} + \frac{q}{2 (1 - q)} + \frac{1}{2}\log(1 - q). \] We need to exchange the operations $\lim_{k \to 0}$ and $\sup_{\mu, q}$ in some way. Another heuristic physicists’ convention comes here: the $\lim_{k \to 0} \sup_{\mu, q}$ operation becomes the $\ext_{\mu, q} \lim_{k \to 0}$ operation, where $\ext_{\bx} f(\bx)$ is a set defined as \[ \ext_{\bx} f(\bx) = \Big\{ f(\bx_\star): \nabla f(\bx_\star) = \bzero\Big\}. \] Using this convention, we get \[ \Psi(\beta, \lambda) = \lim_{k \to 0} S(\beta, \lambda, k) / k \in \ext_{\mu, q}u(\mu, q), \]

Next, we calculate all the stationary points of $u$, and collect the value of $u$ evaluated at these stationary points.

By basic calculus, we have \[ \begin{aligned} \partial_\mu u(\mu, q) =& 2 \Big( \beta \lambda - \frac{1}{2 (1-q)} \Big) \mu, \
\partial_q u(\mu, q) =& -2 \beta^2 q - \frac{\mu^2}{2(1 - q)^2} + \frac{1}{2 (1 - q)^2}. \end{aligned} \] When $\beta > 1$, by basic algebra, we find two solutions of $\nabla u = \bzero$.

One branch of solution of $\nabla u = \bzero$ gives \[ \begin{aligned} \mu_1 =& 0, \
q_1 =& 1 - \frac{1}{2 \beta}. \end{aligned} \] At this solution, we have \[ \Psi_1(\beta, \lambda) = u(\mu_1, q_1) = 2 \beta - \frac{3}{4} - \frac{1}{2} \log(2 \beta). \]
Another branch of solution of $\nabla u = \bzero$ gives \[ \begin{aligned} \mu_2 =& \Big( \Big(1 - \frac{1}{\lambda^2}\Big) \Big(1 - \frac{1}{2 \beta \lambda}\Big) \Big)^{1/2},\
q_2 =& 1 - \frac{1}{2 \lambda \beta}, \end{aligned} \] At this solution, we have \[ \Psi_2(\beta, \lambda) = u(\mu_2, q_2) = \beta\Big( \lambda + \frac{1}{\lambda}\Big) - \Big( \frac{1}{4 \lambda^2} + \frac{1}{2} \Big) - \frac{1}{2}\log(2 \lambda \beta). \]

Hence we get \[ \Psi \in \ext_{\mu, q}u(\mu, q) = \{\Psi_1, \Psi_2 \}. \]

4.3. The large $\beta$ limit

Recall the definition of $\vphi$ in Eq. \eqref{eqn:defvphi}. For the first branch $\Psi_1$, the corresponding $\vphi$ function gives \[ \vphi_1(\lambda) = \lim_{\beta \to \infty} \frac{1}{\beta} \Psi_1(\beta, \lambda) = 2. \] For the second branch $\Psi_2$, the corresponding $\vphi$ function gives \[ \vphi_2(\lambda) = \lim_{\beta \to \infty} \frac{1}{\beta} \Psi_2(\beta, \lambda) = \lambda + \frac{1}{\lambda}. \]

4.4. Select a branch

There are two branches of solutions. We need to select the branch that makes sense: we hope that $\vphi(\lambda)$ gives an expression for $\lim_{n\to \infty}\E[\lambda_\max(\bA)] = \lim_{n\to \infty}\E[\lambda_\max(\lambda \bu \bu^\sT + \bW)]$, hence it should be non-decreasing in $\lambda$, and it should satisfy $\lim_{\lambda \to \infty} \vphi(\lambda) = \infty$.

When $\lambda\le 1$, the branch $\vphi_2(\lambda) = \lambda + 1/\lambda$ is decreasing. Hence, we should select the branch $\vphi_1(\lambda) = 2$.
When $\lambda \ge 1$, the branch $\vphi_1(\lambda) = 2$ stays constant. Hence, we should select the branch $\vphi_2(\lambda) = \lambda + 1/\lambda$.

The discussion above gives the following prediction \[ \lim_{n \to \infty} \E[\lambda_{\max}(\bA)]= \vphi(\lambda) = \begin{cases} 2, ~~~& \lambda \le 1, \
\lambda + \frac{1}{\lambda}, ~~~& \lambda > 1. \end{cases} \] This turns out to be the correct answer!

5. Summary

In summary, we calculated the largest eigenvalue of the spiked GOE matrix $\bA = \lambda \bu \bu^\sT + \bW$. Using the free energy trick and the replica trick, the problem becomes calculating three limits sequentially: the large $n$ limit, the small $k$ limit, and the large $\beta$ limit. The large $n$ limit calculation is rigorous, but is also complicated. The small $k$ limit calculation is non-rigorous: we pluged in the replica symmetric ansatz, and changed the $\lim_{k \to 0} \inf_{\mu, q}$ operation to the $\ext_{\mu, q} \lim_{k \to 0}$ operation. The large $\beta$ limit calculation is straightforward. Finally, we get two branches of solutions. We used some simple properties of the largest eigenvalue of $\bA$ to select the branch when $\lambda$ is in different regime. The prediction formula turns out to be the correct answer.

References

[1] Mezard, Marc, and Andrea Montanari. Information, physics, and computation. Oxford University Press, 2009.

[2] Baik, Jinho, Gérard Ben Arous, and Sandrine Péché. “Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices.” The Annals of Probability 33.5 (2005): 1643-1697.

[3] Péché, Sandrine. “The largest eigenvalue of small rank perturbations of Hermitian random matrices.” Probability Theory and Related Fields 134.1 (2006): 127-173.