summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJaron Kent-Dobias <jaron@kent-dobias.com>2024-06-10 19:29:07 +0200
committerJaron Kent-Dobias <jaron@kent-dobias.com>2024-06-10 19:29:07 +0200
commit2de476de17aea90b040875a7e3e87c308944a035 (patch)
treec1e7eeca470f42d3c6fae69f48871b44725a4c8e
parentd02197918d759195dd1d863de28f352f70cbf14c (diff)
downloadmarginal-2de476de17aea90b040875a7e3e87c308944a035.tar.gz
marginal-2de476de17aea90b040875a7e3e87c308944a035.tar.bz2
marginal-2de476de17aea90b040875a7e3e87c308944a035.zip
Lots of writing, especially in appendix on superspace.
-rw-r--r--marginal.tex169
1 files changed, 139 insertions, 30 deletions
diff --git a/marginal.tex b/marginal.tex
index 246bcaf..024b0ac 100644
--- a/marginal.tex
+++ b/marginal.tex
@@ -442,6 +442,40 @@ Finally, the marginal complexity is defined by evaluating the complexity conditi
=\Sigma_0(E,\mu_\text m(E))
\end{equation}
+\subsection{General features of saddle point computation}
+
+\begin{align}
+ \label{eq:delta.grad}
+ &\delta\big(\nabla H(\mathbf x_a,\pmb\omega_a)\big)
+ =\int\frac{d\hat{\mathbf x}_a}{(2\pi)^N}e^{i\hat{\mathbf x}_a^T\nabla H(\mathbf x_a,\pmb\omega_a)} \\
+ \label{eq:delta.energy}
+ &\delta\big(NE-H(\mathbf x_a)\big)
+ =\int\frac{d\hat\beta_a}{2\pi}e^{\hat\beta_a(NE-H(\mathbf x_a))} \\
+ &\delta\big(N\lambda^*-\mathbf s^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\mathbf s\big)
+ \label{eq:delta.eigen}
+ =\int\frac{d\hat\lambda_a}{2\pi}e^{\hat\lambda_a(N\lambda^*-\mathbf s^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\mathbf s)}
+\end{align}
+
+Here we will merely sketch the steps that are standard. We start by translating elements of the Kac--Rice measure into terms more familiar to physicists. This means writing \eqref{eq:delta.grad}, \eqref{eq:delta.energy}, and \eqref{eq:delta.eigen}
+for the Dirac $\delta$ functions. At this point we will also discuss an
+important step we will use repeatedly in this paper: to drop the absolute value
+signs around the determinant in the Kac--Rice measure. This can potentially
+lead to severe problems with the complexity. However, it is a justified step
+when the parameters of the problem, i.e., $E$, $\mu$, and $\lambda^*$ put us in
+a regime where the exponential majority of stationary points have the same
+index. This is true for maxima and minima, and for saddle points whose spectra have a strictly positive bulk with a fixed number of negative
+outliers. Dropping the absolute value sign allows us to write
+\begin{equation}
+ \det\operatorname{Hess}H(\mathbf x_a, \pmb\omega_a)
+ =\int d\pmb\eta_a\,d\bar{\pmb\eta}_a\,e^{\bar{\pmb\eta}_a^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\pmb\eta_a}
+\end{equation}
+for $N$-dimensional Grassmann variables $\bar{\pmb\eta}_a$ and $\pmb\eta_a$. For
+the spherical models this step is unnecessary, since there are other ways to
+treat the determinant keeping the absolute value signs, as in previous works
+\cite{Folena_2020_Rethinking, Kent-Dobias_2023_How}. However, since other of
+our examples are for models where the same techniques are impossible, it is
+useful to see the fermionic method in action in this simple case.
+
\section{Examples}
\subsection{Spherical spin glasses}
@@ -475,36 +509,7 @@ calculation for this case, since we will something about its application in
more nontrivial settings.
The procedure to treat the complexity of the spherical models has been made in
-detail elsewhere \cite{Kent-Dobias_2023_How}. Here we will merely sketch the steps that are standard. We start by translating elements of the Kac--Rice measure into terms more familiar to physicists. This means writing
-\begin{align}
- \label{eq:delta.grad}
- \delta\big(\nabla H(\mathbf x_a,\pmb\omega_a)\big)
- &=\int\frac{d\hat{\mathbf x}_a}{(2\pi)^N}e^{i\hat{\mathbf x}_a^T\nabla H(\mathbf x_a,\pmb\omega_a)} \\
- \label{eq:delta.energy}
- \delta\big(NE-H(\mathbf x_a)\big)
- &=\int\frac{d\hat\beta_a}{2\pi}e^{\hat\beta_a(NE-H(\mathbf x_a))} \\
- \delta\big(N\lambda^*-\mathbf s^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\mathbf s\big)
- \label{eq:delta.eigen}
- &=\int\frac{d\hat\lambda_a}{2\pi}e^{\hat\lambda_a(N\lambda^*-\mathbf s^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\mathbf s)}
-\end{align}
-for the Dirac $\delta$ functions. At this point we will also discuss an
-important step we will use repeatedly in this paper: to drop the absolute value
-signs around the determinant in the Kac--Rice measure. This can potentially
-lead to severe problems with the complexity. However, it is a justified step
-when the parameters of the problem, i.e., $E$, $\mu$, and $\lambda^*$ put us in
-a regime where the exponential majority of stationary points have the same
-index. This is true for maxima and minima, and for saddle points whose spectra have a strictly positive bulk with a fixed number of negative
-outliers. Dropping the absolute value sign allows us to write
-\begin{equation}
- \det\operatorname{Hess}H(\mathbf x_a, \pmb\omega_a)
- =\int d\pmb\eta_a\,d\bar{\pmb\eta}_a\,e^{\bar{\pmb\eta}_a^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\pmb\eta_a}
-\end{equation}
-for $N$-dimensional Grassmann variables $\bar{\pmb\eta}_a$ and $\pmb\eta_a$. For
-the spherical models this step is unnecessary, since there are other ways to
-treat the determinant keeping the absolute value signs, as in previous works
-\cite{Folena_2020_Rethinking, Kent-Dobias_2023_How}. However, since other of
-our examples are for models where the same techniques are impossible, it is
-useful to see the fermionic method in action in this simple case.
+detail elsewhere \cite{Kent-Dobias_2023_How}.
Once these substitutions have been made, the entire expression
\eqref{eq:min.complexity.expanded} is an exponential integral whose argument is
@@ -954,6 +959,110 @@ taking the zero-temperature limit, we find
\appendix
+\section{A primer on superspace}
+\label{sec:superspace}
+
+The superspace $\mathbb R^{N|2D}$ is a vector space with $N$ real indices and
+$2D$ Grassmann indices $\bar\theta_1,\theta_1,\ldots,\bar\theta_D,\theta_D$.
+The Grassmann indices anticommute like fermions. Their integration is defined by
+\begin{equation}
+ \int d\theta\,\theta=1
+ \qquad
+ \int d\theta\,1=0
+\end{equation}
+Because the Grassmann indices anticommute, their square is always zero.
+Therefore, any series expansion of a function with respect to a given Grassmann
+index will terminate exactly at linear order, while a series expansion with
+respect to $n$ Grassmann variables will terminate exactly at $n$th order. If
+$f$ is an arbitrary function, then
+\begin{equation}
+ \int d\theta\,f(a+b\theta)
+ =\int d\theta\,\left[f(a)+f'(a)b\theta\right]
+ =f'(a)b
+\end{equation}
+This kind of behavior of integrals over the Grassmann indices makes them useful
+for compactly expressing the Kac--Rice measure. To see why, consider the
+specific superspace $\mathbb R^{N|2}$, where an arbitrary vector can be expression as
+\begin{equation}
+ \pmb\phi(1)=\mathbf x+\bar\theta_1\pmb\eta+\bar{\pmb\eta}\theta_1+\bar\theta_1\theta_1i\hat{\mathbf x}
+\end{equation}
+where $\mathbf x,\hat{\mathbf x}\in\mathbb R^N$ and $\bar{\pmb\eta},\pmb\eta$ are
+$N$-dimensional Grassmann vectors. The dependence of $\pmb\phi$ on 1 indicates
+the index of Grassmann variables $\bar\theta_1,\theta_1$ inside, since we will
+sometimes want to use, e.g., $\pmb\phi(2)$ defined identically save for
+substitution by $\bar\theta_2,\theta_2$. Consider the series expansion of an arbitrary function $f$ of this supervector:
+\begin{equation}
+ \begin{aligned}
+ f\big(\pmb\phi(1)\big)
+ &=f(\mathbf x)
+ +\big(\bar\theta_1\pmb\eta+\bar{\pmb\eta}\theta_1+\bar\theta_1\theta_1i\hat{\mathbf x}\big)^T\partial f(\mathbf x) \\
+ &\quad+\frac12\big(\bar\theta_1\pmb\eta+\bar{\pmb\eta}\theta_1\big)^T\partial\partial f(\mathbf x)\big(\bar\theta_1\pmb\eta+\bar{\pmb\eta}\theta_1\big) \\
+ &=f(\mathbf x)
+ +\big(\bar\theta_1\pmb\eta+\bar{\pmb\eta}\theta_1+\bar\theta_1\theta_1i\hat{\mathbf x}\big)^T\partial f(\mathbf x) \\
+ &\qquad-\bar\theta_1\theta_1\bar{\pmb\eta}^T\partial\partial f(\mathbf x)\pmb\eta
+ \end{aligned}
+\end{equation}
+where the last step we used the fact that the Hessian matrix is symmetric and
+that squares of Grassmann indicies vanish. Using the integration rules defined above, we find
+\begin{equation}
+ \int d\theta_1\,d\bar\theta_1\,f\big(\pmb\phi(1)\big)
+ =i\hat{\mathbf x}^T\partial f(\mathbf x)-\bar{\pmb\eta}^T\partial\partial f(\mathbf x)\pmb\eta
+\end{equation}
+These two terms are precisely the exponential representation of the Dirac
+$\delta$ function of the gradient and determinant of the Hessian (without
+absolute value sign) that make up the basic Kac--Rice measure, so that we can write
+\begin{equation}
+ \begin{aligned}
+ &\int d\mathbf x\,\delta\big(\nabla H(\mathbf x)\big)\,\det\operatorname{Hess}H(\mathbf x) \\
+ &\qquad=\int d\mathbf x\,d\bar{\pmb\eta}\,d\pmb\eta\,\frac{d\hat{\mathbf x}}{(2\pi)^N}\,e^{i\hat{\mathbf x}^T\nabla H(\mathbf x)-\bar{\pmb\eta}^T\operatorname{Hess}H(\mathbf x)\pmb\eta} \\
+ &\qquad=\int d\pmb\phi\,e^{\int d1\,H(\pmb\phi(1))}
+ \end{aligned}
+\end{equation}
+where we have written $d1=d\theta_1\,d\bar\theta_1$ and $d\pmb\phi=d\mathbf
+x\,d\bar{\pmb\eta}\,d\pmb\eta\,\frac{d\hat{\mathbf x}}{(2\pi)^N}$. Besides some deep connections
+to the physics of BRST, this compact notation dramatically simplifies the
+analytical treatment of the problem. The reason why this simplification is
+possible is because there are a large variety of superspace algebraic and
+integral operations with direct corollaries to their ordinary real
+counterparts. For instance, consider a super linear operator $M(1,2)$, which
+like the super vector $\pmb\phi$ is made up of a linear combination of $N\times
+N$ regular or Grassmann matrices indexed by every nonvanishing combination of
+the Grassmann indices $\bar\theta_1,\theta_1,\bar\theta_2,\theta_2$. Such a supermatrix acts on supervectors by ordinary matrix multiplication and convolution in the Grassmann indices, i.e.,
+\begin{equation}
+ (M\pmb\phi)(1)=\int d1\,M(1,2)\pmb\phi(2)
+\end{equation}
+Integrals involving superfields contracted into such operators result in schematically familiar expressions, like that of the standard Gaussian:
+\begin{equation}
+ \int d\pmb\phi\,e^{\int\,d1\,d2\,\pmb\phi(1)^TM(1,2)\pmb\phi(2)}
+ =(\operatorname{sdet}M)^{-N/2}
+\end{equation}
+where the usual role of the determinant is replaced by the superdeterminant.
+The superdeterminant can be defined using the ordinary determinant by writing a
+block version of the matrix $M$: if $\mathbf e(1)=\{1,\bar\theta_1\theta\}$ is
+the basis vector of the even subspace of the superspace and $\mathbf
+f(1)=\{\bar\theta_1,\theta_1\}$ is that of the odd subspace, then we can form a
+block representation of $M$ in analogy to the matrix form of an operator in quantum mechanics by
+\begin{equation}
+ \int d1\,d2\,\begin{bmatrix}
+ \mathbf e(1)M(1,2)\mathbf e(2)^T
+ &
+ \mathbf e(1)M(1,2)\mathbf f(2)^T
+ \\
+ \mathbf f(1)M(1,2)\mathbf e(2)^T
+ &
+ \mathbf f(1)M(1,2)\mathbf f(2)^T
+ \end{bmatrix}
+ =\begin{bmatrix}
+ A & B \\ C & D
+ \end{bmatrix}
+\end{equation}
+Then the superdeterminant of $M$ is given by
+\begin{equation}
+ \operatorname{sdet}M=\det(A-BD^{-1}C)\det(D)^{-1}
+\end{equation}
+which is the same for the normal equation for the determinant of a block matrix
+save for the inverse of $\det D$.
+
\section{Complexity of dominant optima in the least-squares problem}
\label{sec:dominant.complexity}