summaryrefslogtreecommitdiff
path: root/marginal.tex
diff options
context:
space:
mode:
authorJaron Kent-Dobias <jaron@kent-dobias.com>2024-06-26 17:09:28 +0200
committerJaron Kent-Dobias <jaron@kent-dobias.com>2024-06-26 17:09:28 +0200
commit455e0b9de4e7ff101eb0e290ddcb559be9d1a020 (patch)
tree0dec763dffd3d9f082fa9f4c62808f7235c2e490 /marginal.tex
parent90f728c17b2477aa9714fce514f93177bc802257 (diff)
downloadmarginal-455e0b9de4e7ff101eb0e290ddcb559be9d1a020.tar.gz
marginal-455e0b9de4e7ff101eb0e290ddcb559be9d1a020.tar.bz2
marginal-455e0b9de4e7ff101eb0e290ddcb559be9d1a020.zip
Lots of fixes, new figure.
Diffstat (limited to 'marginal.tex')
-rw-r--r--marginal.tex398
1 files changed, 252 insertions, 146 deletions
diff --git a/marginal.tex b/marginal.tex
index bc7c4ab..408a1da 100644
--- a/marginal.tex
+++ b/marginal.tex
@@ -250,7 +250,7 @@ $Q^{\alpha\beta}=\frac1N\mathbf s^\alpha\cdot\mathbf s^\beta$. This gives
e^{N\mathcal U_\mathrm{GOE}(\hat\lambda,Q\mid\beta,\lambda^*,\mu)}
\end{equation}
where the effective action is given by
-\begin{equation}
+\begin{equation} \label{eq:goe.action}
\begin{aligned}
&\mathcal U_\textrm{GOE}(\hat\lambda, Q\mid\beta,\lambda^*,\mu)
=\hat\lambda(\lambda^*-\mu)-m\beta\mu \\
@@ -288,7 +288,7 @@ $m$ to zero, we arrive at
with the effective action
\begin{equation}
\begin{aligned}
- &\mathcal U_\mathrm{GOE}(\hat\lambda,q_0,\tilde q_0\mid\mu,\lambda^*,\beta) \\
+ &\mathcal U_\mathrm{GOE}(\hat\lambda,q_0,\tilde q_0\mid\beta,\lambda^*,\mu) \\
&\quad=\hat\lambda(\lambda^*-\mu)+\sigma^2\left[
2\beta^2(q_0^2-\tilde q_0^2)+2\beta\hat\lambda(1-\tilde q_0^2)+\hat\lambda^2
\right] \\
@@ -385,6 +385,7 @@ large-deviation functions will lie on the singular boundary between a purely
real and complex value.
\subsection{Conditioning on a pseudogap}
+\label{sec:pseudogap}
We have seen that this method effectively conditions a random matrix ensemble
on its lowest eigenvalue being zero. However, this does not correspond on its
@@ -444,38 +445,57 @@ geometry of such landscapes is studied by their complexity, or the average
logarithm of the number of stationary points with certain properties, e.g., of
marginal minima at a given energy.
-Such problems can be studied using the method of Lagrange multipliers, with one introduced for every constraint. If the configuration space is defined by $r$ constraints, then the problem is to extremize the Lagrangian
+Such problems can be studied using the method of Lagrange multipliers, with one
+introduced for every constraint. If the configuration space is defined by $r$
+constraints, then the problem of identifying stationary points is reduced to
+extremizing the Lagrangian
\begin{equation}
L(\mathbf x,\pmb\omega)=H(\mathbf x)+\sum_{i=1}^r\omega_ig_i(\mathbf x)
\end{equation}
-with respect to $\mathbf x$ and $\pmb\omega=\{\omega_1,\ldots,\omega_r\}$. The corresponding gradient and Hessian for the problem are
+with respect to $\mathbf x$ and the Lagrange multipliers
+$\pmb\omega=\{\omega_1,\ldots,\omega_r\}$. The corresponding gradient and
+Hessian for the problem are
\begin{align}
- \nabla H(\mathbf x,\pmb\omega)
- &=\partial L(\mathbf x,\pmb\omega)
+ &\nabla H(\mathbf x,\pmb\omega)
+ =\partial L(\mathbf x,\pmb\omega)
=\partial H(\mathbf x)+\sum_{i=1}^r\omega_i\partial g_i(\mathbf x)
\\
- \operatorname{Hess}H(\mathbf x,\pmb\omega)
- &=\partial\partial L(\mathbf x,\pmb\omega)
- =\partial\partial H(\mathbf x)+\sum_{i=1}^r\omega_i\partial\partial g_i(\mathbf x)
+ &\begin{aligned}
+ \operatorname{Hess}H(\mathbf x,\pmb\omega)
+ &=\partial\partial L(\mathbf x,\pmb\omega) \\
+ &=\partial\partial H(\mathbf x)+\sum_{i=1}^r\omega_i\partial\partial g_i(\mathbf x)
+ \end{aligned}
\end{align}
-The number of stationary points in a landscape for a particular realization $H$ is found by integrating over the Kac--Rice measure
+where $\partial=\frac\partial{\partial\mathbf x}$ will always represent the
+derivative with respect to the vector argument $\mathbf x$. The number of
+stationary points in a landscape for a particular function $H$ is found by
+integrating over the Kac--Rice measure
\begin{equation} \label{eq:kac-rice.measure}
- d\nu_H(\mathbf x,\pmb\omega)=d\mathbf x\,d\pmb\omega\,\delta\big(\nabla H(\mathbf x,\pmb\omega)\big)\,\delta\big(\mathbf g(\mathbf x)\big)\,\big|\det\operatorname{Hess}H(\mathbf x,\pmb\omega)\big|
+ \begin{aligned}
+ &d\nu_H(\mathbf x,\pmb\omega) \\
+ &\quad=
+ d\mathbf x\,d\pmb\omega\,\delta\big(\mathbf g(\mathbf x)\big)
+ \,\delta\big(\nabla H(\mathbf x,\pmb\omega)\big)
+ \,\big|\det\operatorname{Hess}H(\mathbf x,\pmb\omega)\big|
+ \end{aligned}
\end{equation}
with a $\delta$-function of the gradient and the constraints ensuring that we
-count valid stationary points, and the Hessian entering in the determinant as
+count valid stationary points, and the determinant of the Hessian serving as
the Jacobian of the argument to the $\delta$-function. It is usually more
interesting to condition the count on interesting properties of the stationary
-points, like the energy and spectrum trace,
+points, like the energy and spectrum trace, or
\begin{equation} \label{eq:kac-rice.measure.2}
\begin{aligned}
&d\nu_H(\mathbf x,\pmb\omega\mid E,\mu) \\
- &\quad=d\nu_H(\mathbf x,\pmb\omega)\,
+ &=d\nu_H(\mathbf x,\pmb\omega)\,
\delta\big(NE-H(\mathbf x)\big)
\,\delta\big(N\mu-\operatorname{Tr}\operatorname{Hess}H(\mathbf x,\pmb\omega)\big)
\end{aligned}
\end{equation}
-We further want to control the value of the minimum eigenvalue of the Hessian at the stationary points. Using the method introduced above, we can write the number of stationary points with energy $E$, Hessian trace $\mu$, and smallest eigenvalue $\lambda^*$ as
+We further want to control the value of the minimum eigenvalue of the Hessian
+at the stationary points. Using the method introduced in Section
+\ref{sec:eigenvalue}, we can write the number of stationary points with energy
+$E$, Hessian trace $\mu$, and smallest eigenvalue $\lambda^*$ as
\begin{widetext}
\begin{equation}
\begin{aligned}
@@ -487,41 +507,56 @@ We further want to control the value of the minimum eigenvalue of the Hessian at
\delta\big(N\lambda^*-\mathbf s^T\operatorname{Hess}H(\mathbf x,\pmb\omega)\mathbf s\big)
\end{aligned}
\end{equation}
-where the $\delta$-functions
+where the additional $\delta$-functions
\begin{equation}
\delta(\mathbf s^T\partial\mathbf g(\mathbf x))
=\prod_{s=1}^r\delta(\mathbf s^T\partial g_i(\mathbf x))
\end{equation}
ensure that the integrals are constrained to the tangent space of the
-configuration manifold at the point $\mathbf x$. This likewise allows us to
-define the complexity of points with a specific energy, stability, and minimum eigenvalue as
+configuration manifold at the point $\mathbf x$. The complexity of points with
+a specific energy, stability, and minimum eigenvalue is defined as the average
+over functions $H$ of the logarithm of the number $\mathcal N_H$ of stationary
+points, or
\begin{equation}
\Sigma_{\lambda^*}(E,\mu)
=\frac1N\overline{\log\mathcal N_H(E,\mu,\lambda^*)}
\end{equation}
In practice, this can be computed by introducing replicas to treat the
logarithm ($\log x=\lim_{n\to0}\frac\partial{\partial n}x^n$) and replicating
-again to treat each of the normalizations in the numerator ($x^{-1}=\lim_{m\to-1}x^m$). This leads to the expression
+again to treat each of the normalizations in the numerator
+($x^{-1}=\lim_{m\to-1}x^m$). This leads to the expression
\begin{equation} \label{eq:min.complexity.expanded}
\begin{aligned}
\Sigma_{\lambda^*}(E,\mu)
- &=\lim_{\beta\to\infty}\lim_{n\to0}\frac1N\frac\partial{\partial n}\int\prod_{a=1}^n\Bigg[d\nu_H(\mathbf x_a,\pmb\omega_a\mid E,\mu)\,\delta\big(N\lambda^*-(\mathbf s_a^1)^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega_a)\mathbf s_a^1\big)\\
- &\hspace{12em}\times\lim_{m_a\to0}
+ &=\lim_{\beta\to\infty}\lim_{n\to0}\frac1N\frac\partial{\partial n}
+ \int\prod_{a=1}^n\overline{\Bigg[d\nu_H(\mathbf x_a,\pmb\omega_a\mid E,\mu)\,\delta\big(N\lambda^*-(\mathbf s_a^1)^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega_a)\mathbf s_a^1\big)}\\
+ &\hspace{12em}\overline{\times\lim_{m_a\to0}
\left(\prod_{\alpha=1}^{m_a} d\mathbf s_a^\alpha
\,\delta\big(N-\|\mathbf s_a^\alpha\|^2\big)
\,\delta\big((\mathbf s_a^\alpha)^T\partial\mathbf g(\mathbf x_a)\big)
\,e^{-\beta(\mathbf s_a^\alpha)^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega_a)\mathbf s_a^\alpha}\right)
- \Bigg]
+ \Bigg]}
\end{aligned}
\end{equation}
\end{widetext}
for the complexity of stationary points of a given energy, trace, and smallest eigenvalue.
-Finally, the \emph{marginal} complexity is given by fixing $\mu=\mu_\text{m}$ so that the complexity is stationary with respect to changes in the value of the minimum eigenvalue, or
+The marginal complexity follows from the complexity as a function of $\mu$ and
+$\lambda^*$ in an analogous way to Section \ref{sec:pseudogap}. In general, one
+sets $\lambda^*=0$ and tunes $\mu$ from a sufficiently large value until the
+complexity develops an imaginary component, which corresponds to the bulk of
+the spectrum touching zero. The value $\mu=\mu_\mathrm m$ that satisfies this
+is the marginal stability.
+
+In the cases studied here with zero signal-to-noise, a simpler approach is
+possible. The marginal stability $\mu=\mu_\text{m}$ can be identified by
+requiring that the complexity is stationary with respect to changes in the
+value of the minimum eigenvalue $\lambda^*$, or
\begin{equation}
0=\frac\partial{\partial\lambda^*}\Sigma_{\lambda^*}(E,\mu_\text{m}(E))\bigg|_{\lambda^*=0}
\end{equation}
-Finally, the marginal complexity is defined by evaluating the complexity conditioned on $\lambda_{\text{min}}=0$ at $\mu=\mu_\text{m}(E)$,
+The marginal complexity follows by evaluating the complexity conditioned on
+$\lambda_{\text{min}}=0$ at the marginal stability $\mu=\mu_\text{m}(E)$,
\begin{equation}
\Sigma_\text{m}(E)
=\Sigma_0(E,\mu_\text m(E))
@@ -551,27 +586,29 @@ expressed using their Fourier representation, with
\end{aligned}
\end{align}
To do this we have introduced auxiliary fields $\hat{\mathbf x}_a$,
-$\hat\beta_a$, and $\hat\lambda_a$. Since the permutation symmetry of vector
-elements is preserved in \textsc{rsb} order, the order parameters $\hat\beta$
+$\hat\beta_a$, and $\hat\lambda_a$. Since the permutation symmetry of replica vectors
+is preserved in \textsc{rsb} orders, the order parameters $\hat\beta$
and $\hat\lambda$ will quickly lose their indices, since they will ubiquitously
-be constant over the replicas at the eventual saddle point solution.
+be constant over the replicas index at the eventual saddle point solution.
We would like to make a similar treatment of the determinant of the Hessian
that appears in \eqref{eq:kac-rice.measure}. The standard approach is to drop
the absolute value function around the determinant. This can potentially lead
to severe problems with the complexity. However, it is a justified step when
-the parameters of the problem, i.e., $E$, $\mu$, and $\lambda^*$ put us in a
+the parameters of the problem, i.e., $E$, $\mu$, and $\lambda^*$, put us in a
regime where the exponential majority of stationary points have the same index.
This is true for maxima and minima, and for saddle points whose spectra have a
strictly positive bulk with a fixed number of negative outliers. It is in
particular a safe operation for this problem of marginal minima, which lie
-right at the edge of disaster. Dropping the absolute value sign allows us to
-write
+right at the edge of disaster.
+
+Dropping the absolute value sign allows us to write
\begin{equation} \label{eq:determinant}
\det\operatorname{Hess}H(\mathbf x_a, \pmb\omega_a)
- =\int d\bar{\pmb\eta}_a\,d\pmb\eta_a\,e^{-\bar{\pmb\eta}_a^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\pmb\eta_a}
+ =\int d\bar{\pmb\eta}_a\,d\pmb\eta_a\,
+ e^{-\bar{\pmb\eta}_a^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega_a)\pmb\eta_a}
\end{equation}
-for $N$-dimensional Grassmann variables $\bar{\pmb\eta}_a$ and $\pmb\eta_a$. For
+for $N$-dimensional Grassmann vectors $\bar{\pmb\eta}_a$ and $\pmb\eta_a$. For
the spherical models this step is unnecessary, since there are other ways to
treat the determinant keeping the absolute value signs, as in previous works
\cite{Folena_2020_Rethinking, Kent-Dobias_2023_How}. However, other of
@@ -584,11 +621,11 @@ than that of the constraint functions $\partial\partial g_i$. The result is that
\begin{equation}
\mu
=\frac1N\operatorname{Tr}\operatorname{Hess}H(\mathbf x)
- =\frac1N\sum_{i=1}^r\omega_i\partial\partial g_i(\mathbf x)
+ =\frac1N\sum_{i=1}^r\omega_i\operatorname{Tr}\partial\partial g_i(\mathbf x)
+O(N^{-1})
\end{equation}
In particular, here we study only cases with quadratic $g_i$, which results in
-an expression relating $\mu$ and the $\omega_i$ that is independent of $\mathbf
+a linear expression relating $\mu$ and the $\omega_i$ that is independent of $\mathbf
x$. Since $H$ contains the disorder of the problem, this simplification means
that the effect of fixing the trace is independent of the disorder and only
depends on properties of the constraint manifold.
@@ -603,7 +640,7 @@ The use of superspace in the Kac--Rice calculation is well established, as well
as the deep connections with BRST symmetry that is implied.
Appendix~\ref{sec:superspace} introduces the notation and methods of
superspace. Here we describe how it can be used to simplify the complexity
-calculation in the marginal case.
+calculation for marginal minima.
We consider the $\mathbb R^{N|4}$ superspace whose Grassmann indices are
$\bar\theta_1,\theta_1,\bar\theta_2,\theta_2$. Consider the supervector defined
@@ -624,18 +661,18 @@ smallest eigenvalue $\lambda^*$ can be written as
\begin{equation}
\begin{aligned}
\mathcal N_H(E,\mu,\lambda^*)^n
- &=\lim_{\beta\to\infty}\int\prod_{a=1}^nd\pmb\omega_a\lim_{m_a\to0}\prod_{\alpha=1}^{m_a}d\pmb\phi_a^\alpha
+ &=\lim_{\beta\to\infty}\int d\pmb\omega\,d\hat\beta\,d\hat\lambda\prod_{a=1}^n\lim_{m_a\to0}\prod_{\alpha=1}^{m_a}d\pmb\phi_a^\alpha
\exp\left\{
- \delta^{\alpha1}N(\hat\beta_aE+\hat\lambda_a\lambda^*)
- +\int d1\,d2\,B_a^\alpha(1,2)L(\pmb\phi_a^\alpha(1,2),\pmb\omega_a)
+ \delta^{\alpha1}N(\hat\beta E+\hat\lambda\lambda^*)
+ +\int d1\,d2\,B^\alpha(1,2)L\big(\pmb\phi_a^\alpha(1,2),\pmb\omega\big)
\right\}
\end{aligned}
\end{equation}
Here we have also defined the operator
\begin{equation}
- B_a^\alpha(1,2)=\delta^{\alpha1}\bar\theta_2\theta_2
- (1-\hat\beta_a\bar\theta_1\theta_1)
- -\delta^{\alpha1}\hat\lambda_a-\beta
+ B^\alpha(1,2)=\delta^{\alpha1}\bar\theta_2\theta_2
+ (1-\hat\beta\bar\theta_1\theta_1)
+ -\delta^{\alpha1}\hat\lambda-\beta
\end{equation}
which encodes various aspects of the complexity problem, and the measures
\begin{align}
@@ -649,7 +686,8 @@ which encodes various aspects of the complexity problem, and the measures
d\mathbf s_a^\alpha\,\delta(\|\mathbf s_a^\alpha\|^2-N)\,
\delta\big((\mathbf s_a^\alpha)^T\partial\mathbf g(\mathbf x_a)\big)
\\
- d\pmb\omega_a&=\prod_{i=1}^rd\omega_{ai}\,\delta\big(N\mu-\omega_{ai}\partial\partial g_i(\mathbf x_a)\big)
+ d\pmb\omega&=\bigg(\prod_{i=1}^rd\omega_i\bigg)
+ \,\delta\bigg(N\mu-\sum_i^r\omega_i\operatorname{Tr}\partial\partial g_i\bigg)
\end{align}
that collect the individual measures of the various fields embedded in the superfield.
\end{widetext}
@@ -657,7 +695,7 @@ With this way of writing the replicated count, the problem of marginal
complexity temporarily takes the schematic form of an equilibrium calculation
with configurations $\pmb\phi$, inverse temperature $B$, and energy $L$. This
makes the intermediate pieces of the calculation dramatically simpler. Of
-course the complexity of the underlying problem is not banished: near the end
+course the intricacies of the underlying problem are not banished: near the end
of the calculation, terms involving the superspace must be expanded.
\section{Examples}
@@ -676,9 +714,12 @@ between stationary points. Finally, in Section \ref{sec:least.squares} we analyz
\label{sec:ex.spherical}
The spherical spin glasses are a family of models that encompass every
-isotropic Gaussian field on the hypersphere defined by all $\mathbf x\in\mathbb R^N$ such that $0=\mathbf x^T\mathbf x-N$. One can consider the models as defined by centered Gaussian functions $H$ such that the covariance between two points in the configuration space is
+isotropic Gaussian field on the hypersphere defined by all $\mathbf x\in\mathbb
+R^N$ such that $0=g(\mathbf x)=\frac12(\|\mathbf x\|^2-N)$. One can consider the models as
+defined by centered Gaussian functions $H$ such that the covariance between two
+points in the configuration space is
\begin{equation}
- \overline{H(\mathbf x)H(\mathbf x')}=Nf\left(\frac{\mathbf x^T\mathbf x'}N\right)
+ \overline{H(\mathbf x)H(\mathbf x')}=Nf\left(\frac{\mathbf x\cdot\mathbf x'}N\right)
\end{equation}
for some function $f$ with positive series coefficients. Such functions can be considered to be made up of all-to-all tensorial interactions, with
\begin{equation}
@@ -697,15 +738,18 @@ Hessian is statistically independent of the gradient and energy
mostly independently from the problem of counting stationary points. Second, in
these models the Hessian at every point in the landscape belongs to the GOE
class with the same width of the spectrum $\mu_\mathrm m=2\sqrt{f''(1)}$.
-Therefore, all marginal optima in these systems have the same constant shift
-$\mu=\pm\mu_\mathrm m$. Despite the fact the complexity of marginal optima is
+Therefore, all marginal minima in these systems have the same constant shift
+$\mu=\mu_\mathrm m$. Despite the fact the complexity of marginal optima is
well known by simpler methods, it is instructive to carry through the
calculation for this case, since we will learn something about its application in
more nontrivial settings.
The procedure to treat the complexity of the spherical models has been made in
detail elsewhere \cite{Kent-Dobias_2023_How}. Here we make only a sketch of the
-steps involved. First the substitutions \eqref{eq:delta.grad},
+steps involved. First we notice that
+$\mu=\frac1N\omega\operatorname{Tr}\partial\partial g(\mathbf x)=\omega$, so
+that the only Lagrange multiplier $\omega$ in this problem is set directly to
+the shift $\mu$. The substitutions \eqref{eq:delta.grad},
\eqref{eq:delta.energy}, and \eqref{eq:delta.eigen} are made to convert the
Dirac $\delta$ functions into exponential integrals, and the substitution
\eqref{eq:determinant} is made to likewise convert the determinant.
@@ -713,18 +757,18 @@ Dirac $\delta$ functions into exponential integrals, and the substitution
Once these substitutions have been made, the entire expression
\eqref{eq:min.complexity.expanded} is an exponential integral whose argument is
a linear functional of $H$. This allows for the average to be taken over the
-disorder. If we gather all the $H$-dependant pieces into the linear functional
-$\mathcal O$ then the average gives
+disorder. If we gather all the $H$-dependant pieces associated with replica $a$
+into the linear functional $\mathcal O_a$ then the average gives
\begin{equation}
\begin{aligned}
\overline{
e^{\sum_a^n\mathcal O_aH(\mathbf x_a)}
}
&=e^{\frac12\sum_a^n\sum_b^n\mathcal O_a\mathcal O_b\overline{H(\mathbf x_a)H(\mathbf x_b)}} \\
- &=e^{N\frac12\sum_a^n\sum_b^n\mathcal O_a\mathcal O_bf\big(\frac{\mathbf x_a^T\mathbf x_b}N\big)}
+ &=e^{N\frac12\sum_a^n\sum_b^n\mathcal O_a\mathcal O_bf\big(\frac{\mathbf x_a\cdot\mathbf x_b}N\big)}
\end{aligned}
\end{equation}
-The result is an integral that only depends on the many vector variables we
+The result is an integrand that only depends on the many vector variables we
have introduced through their scalar products with each other. We therefore make a change of variables in the integration from those vectors to matrices that encode their possible scalar products. These matrices are
\begin{equation} \label{eq:order.parameters}
\begin{aligned}
@@ -750,7 +794,7 @@ This transformation changes the measure of the integral, with
&\quad=dC\,dR\,dD\,dG\,dQ\,dX\,d\hat X\,(\det J)^{N/2}(\det G)^{-N}
\end{aligned}
\end{equation}
-where $J$ is the Jacobian of the transformation and takes the form
+where $J$ is the Jacobian of the transformation in the real-valued fields and takes the form
\begin{equation} \label{eq:coordinate.jacobian}
J=\begin{bmatrix}
C&iR&X_1&\cdots&X_n \\
@@ -761,12 +805,13 @@ where $J$ is the Jacobian of the transformation and takes the form
\end{bmatrix}
\end{equation}
and the contribution of the Grassmann integrals produces its own inverted
-Jacobian. The block matrices indicated above are such that $A_{ab}$ is an
+Jacobian. The block matrices indicated above are such that $Q_{ab}$ is an
$m_a\times m_b$ matrix indexed by the upper indices, while $X_a$ is an $n\times
m_a$ matrix with one lower and one upper index.
After these steps, which follow identically to those more carefully outlined in
-the cited papers \cite{Folena_2020_Rethinking, Kent-Dobias_2023_How}, we arrive at a form of the integral as over an effective action
+the cited papers \cite{Folena_2020_Rethinking, Kent-Dobias_2023_How}, we arrive
+at a form for the complexity of
\begin{widetext}
\begin{equation} \label{eq:spherical.complexity}
\begin{aligned}
@@ -777,9 +822,9 @@ the cited papers \cite{Folena_2020_Rethinking, Kent-Dobias_2023_How}, we arrive
\exp\Bigg\{
nN\mathcal S_\mathrm{SSG}(\hat\beta,C,R,D,G\mid E,\mu) \\
&\qquad
- +nN\mathcal U_\mathrm{SSG}(\hat\lambda,C,Q,X,\hat X\mid\beta)
+ +nN\mathcal U_\mathrm{SSG}(\hat\lambda,Q,X,\hat X\mid\beta,\lambda^*,\mu,C)
+\frac N2\log\det\left[
- I+\begin{bmatrix}
+ I-\begin{bmatrix}
Q_{11}&\cdots&Q_{1n}\\
\vdots&\ddots&\vdots\\
Q_{n1}&\cdots&Q_{nn}
@@ -800,21 +845,14 @@ the cited papers \cite{Folena_2020_Rethinking, Kent-Dobias_2023_How}, we arrive
\Bigg\}
\end{aligned}
\end{equation}
-where the matrix $J$ is the Jacobian associated with the change of variables
-from the $\mathbf x$, $\hat{\mathbf x}$, and $\mathbf s$, and has the form
-The structure of the integrand, with the effective action split between two
-terms which only share a dependence on the Lagrange multiplier $\omega$ that
-enforces the constraint, is generic to Gaussian problems. This is the
-appearance in practice of the fact mentioned before that conditions on the
-Hessian do not mostly effect the rest of the complexity problem.
-
-The effective action $\mathcal S_\mathrm{SSG}$ is precisely that for the
-ordinary complexity of stationary points, or
+The exponential integrand is split into two effective actions coupled only by a
+residual determinant. The first of these actions is the usual effective action
+for the complexity of the spherical spin glasses, or
\begin{equation}
\begin{aligned}
&\mathcal S_\mathrm{SSG}(\hat\beta,C,R,D,G\mid E,\mu)
=\hat\beta E-(r_d+g_d)\mu \\
- &+\frac1n\left\{\frac12\sum_{ab}\left(
+ &+\lim_{n\to0}\frac1n\left\{\frac12\sum_{ab}\left(
\hat\beta^2f(C_{ab})
+\big(2\hat\beta R_{ab}-D_{ab}\big)f'(C_{ab})
+(R_{ab}^2-G_{ab}^2)f''(C_{ab})
@@ -824,11 +862,13 @@ ordinary complexity of stationary points, or
\end{aligned}
\end{equation}
where $r_d$ and $g_d$ are the diagonal elements of $R$ and $G$, respectively.
+The second of these actions is analogous to \eqref{eq:goe.action} and contains
+the contributions from the marginal pieces of the calculation, and is given by
\begin{equation}
\begin{aligned}
- &\mathcal U_\mathrm{SSG}(\hat\lambda,Q,X,\hat X\mid\lambda^*,\mu,C)
+ &\mathcal U_\mathrm{SSG}(\hat\lambda,Q,X,\hat X\mid\beta,\lambda^*,\mu,C)
=\hat\lambda\lambda^*
- +\frac1n\Bigg\{
+ +\lim_{n\to0}\lim_{m_1\cdots m_n\to0}\frac1n\Bigg\{
\frac12\log\det Q+
\sum_{a=1}^n\bigg(
\sum_{\alpha=1}^{m_a}\beta\mu Q_{aa}^{\alpha\alpha}
@@ -854,6 +894,11 @@ where $r_d$ and $g_d$ are the diagonal elements of $R$ and $G$, respectively.
\end{aligned}
\end{equation}
\end{widetext}
+The fact that the complexity can be split into two relatively independent
+pieces in this way is a characteristic of the Gaussian nature of the spherical
+spin glass. In Section \ref{sec:least.squares} we will study a model whose
+energy is not Gaussian and where such a decomposition is impossible.
+
There are some dramatic simplifications that emerge from the structure of this
particular problem. First, notice that (outside of the `volume' term due to
$J$) the dependence on the parameters $X$ and $\hat X$ are purely quadratic.
@@ -861,19 +906,23 @@ Therefore, there will always be a saddle point condition where they are both
zero. In this case, we except this solution to be correct. We can reason about
why this is so: $X$, for instance, quantifies the correlation between the
typical position of stationary points and the direction of their typical
-eigenvectors. In an isotropic landscape, where no direction is any more
-important than any other, we don't expect such correlations to be nonzero:
-where a state is location does not give any information as to the orientation
+eigenvectors. In a landscape without a signal, where no direction is any more
+important than any other, we expect such correlations to be zero:
+where a state is located does not give any information as to the orientation
of its soft directions. On the other hand, in the spiked case, or with an
external field, the preferred direction can polarize both the direction of
typical stationary points \emph{and} their soft eigenvectors. Therefore, in
these instances one must account for solutions with nonzero $X$ and $\hat X$.
+We further expect that $Q_{ab}=0$ for $a\neq b$. For the contrary to be true,
+eigenvectors at independently sampled stationary points would need to have
+their directions correlated. This is expected in situations with a signal,
+where such correlations would be driven by a shared directional bias towards
+the signal direction. In the present situation, where there is not signal, such
+correlations do not exist.
-
-When we take $X=\hat X=0$, $Q^{\alpha\beta}_{ab}=\delta_{ab}Q^{\alpha\beta}$
-independent, and $Q$ to have the planted replica symmetric form of
-\eqref{eq:Q.structure}, we find that
+When we take $X=\hat X=0$ and
+$Q^{\alpha\beta}_{ab}=\delta_{ab}Q^{\alpha\beta}$, we find that
\begin{equation}
\mathcal U_\mathrm{SSG}(\hat\lambda,Q,0,0\mid\beta,\lambda^*,\mu,C)
=\mathcal U_\mathrm{GOE}(\hat\lambda,Q\mid\mu,\lambda^*,\beta)
@@ -881,28 +930,37 @@ independent, and $Q$ to have the planted replica symmetric form of
with $\sigma^2=f''(1)$. That is, the effective action for the terms related to
fixing the eigenvalue in the spherical Kac--Rice problem is exactly the same as
that for the \textrm{GOE} problem. This is perhaps not so surprising, since we
-established from the beginning that the Hession of the spherical spin glasses
+established from the beginning that the Hessian of the spherical spin glasses
belongs to the GOE class.
+The remaining analysis of the eigenvalue-dependent part $\mathcal
+U_\mathrm{SSG}$ follows precisely the same steps as were made in Section
+\ref{sec:shifted.GOE} for the GOE example. The result of the calculation is
+also the same: the exponential factor containing $\mathcal U_\mathrm{SSG}$
+produces precisely the large deviation function $G_{\lambda^*}(\mu)$ of
+\eqref{eq:goe.large.dev} (again with $\sigma^2=f''(1)$). The remainder of the
+integrand depending on $\mathcal S_\mathrm{SSG}$ produces the ordinary
+complexity of the spherical spin glasses without conditions on the Hessian
+eigenvalue. We therefore find that
\begin{equation}
\Sigma_{\lambda^*}(E,\mu)
=\Sigma(E,\mu)+G_{\lambda^*}(\mu)
\end{equation}
-where $G$ is precisely the function \eqref{eq:goe.large.dev} we found in the
-case of a GOE matrix added to an identity, with $\sigma=\sqrt{f''(1)}$. We find the marginal complexity by solving
+We find the marginal complexity by solving
\begin{equation}
0
=\frac\partial{\partial\lambda^*}\Sigma_{\lambda^*}(E,\mu_\mathrm m(E))\bigg|_{\lambda^*=0}
=\frac\partial{\partial\lambda^*}G_{\lambda^*}(\mu_\mathrm m(E))\bigg|_{\lambda^*=0}
\end{equation}
-which gives $\mu_m(E)=2\sqrt{f''(1)}$ independent of $E$, as we presaged above. Since $G_0(\mu_\mathrm m)=0$, this gives finally
+which gives $\mu_\mathrm m(E)=2\sigma=2\sqrt{f''(1)}$ independent of $E$, as we
+presaged above. Since $G_0(\mu_\mathrm m)=0$, this gives finally
\begin{equation}
\Sigma_\mathrm m(E)
=\Sigma_0(E,\mu_\mathrm m(E))
=\Sigma(E,\mu_\mathrm m)
\end{equation}
that the marginal complexity in these models is simply the ordinary complexity
-evaluated at a fixed trace of the Hessian.
+evaluated at a fixed trace $\mu_\mathrm m$ of the Hessian.
\subsection{Multispherical spin glasses}
\label{sec:multispherical}
@@ -916,23 +974,23 @@ energy is given by
H(\mathbf x)=H_1(\mathbf x^{(1)})+H_2(\mathbf x^{(2)})-\epsilon\mathbf x^{(1)}\cdot\mathbf x^{(2)}
\end{equation}
for $\mathbf x=[\mathbf x^{(1)},\mathbf x^{(2)}]$ for components $\mathbf
-x^{(1)},\mathbf x^{(2)}\in\mathbb R^N$. Each individual sphere energy $H_s$ is
+x^{(1)},\mathbf x^{(2)}\in\mathbb R^N$. The two spheres give rise to two constraints $0=g_1(\mathbf x^{(1)})=\frac12(\|\mathbf x^{(1)}\|^2-N)$ and $0=g_2(\mathbf x^{(2)})=\frac12(\|\mathbf x^{(2)}\|^2-N)$, and similarly two Lagrange multipliers $\omega_1$ and $\omega_2$. Each individual sphere energy $H_s$ is
taken to be a centered Gaussian random function with a covariance given in the
-usual spherical way by
+usual spherical way for $\mathbf x_1,\mathbf x_2\in\mathbb R^N$ by
\begin{equation}
- \overline{H_i(\pmb\sigma_1)H_j(\pmb\sigma_2)}
- =N\delta_{ij}f_i\left(\frac{\pmb\sigma_1\cdot\pmb\sigma_2}N\right)
+ \overline{H_i(\mathbf x_1)H_j(\mathbf x_2)}
+ =N\delta_{ij}f_i\left(\frac{\mathbf x_1\cdot\mathbf x_2}N\right)
\end{equation}
-with the functions $f_1$ and $f_2$ not necessarily the same. In this problem,
-there is an energetic competition between the independent spin glass energies
-on each sphere and their tendency to align or anti-align through the
-interaction term.
+with the functions $f_1$ and $f_2$ not necessarily the same.
-These models have more often been studied with random fully connected couplings
-between the spheres, for which it is possible to also use configuration spaces
-involving spheres of different sizes \cite{Subag_2021_TAP, Subag_2023_TAP,
-Bates_2022_Crisanti-Sommers, Bates_2022_Free, Huang_2023_Strong,
-Huang_2023_Algorithmic, Huang_2024_Optimization}.
+In this problem, there is an energetic competition between the independent spin
+glass energies on each sphere and their tendency to align or anti-align through
+the interaction term. These models have more often been studied with random
+fully connected couplings between the spheres, for which it is possible to also
+use configuration spaces involving spheres of different sizes
+\cite{Subag_2021_TAP, Subag_2023_TAP, Bates_2022_Crisanti-Sommers,
+ Bates_2022_Free, Huang_2023_Strong, Huang_2023_Algorithmic,
+Huang_2024_Optimization}.
Because the energy is Gaussian, properties of the Hessian are once again
statistically independent of those of the energy and gradient. However, unlike
@@ -942,31 +1000,37 @@ in this problem through the presence of a configuration space defined by
multiple constraints, and therefore multiple Lagrange multipliers are necessary
to ensure they are all fixed. The resulting Lagrangian, gradient, and Hessian are
\begin{align}
- L(\mathbf x)&=H(\mathbf x)
- +\frac12\omega^{(1)}\big(\|\mathbf x^{(1)}\|^2-N\big) \\
- &\qquad\qquad\qquad+\frac12\omega^{(2)}\big(\|\mathbf x^{(2)}\|^2-N\big)
+ &\begin{aligned}
+ L(\mathbf x)&=H(\mathbf x)
+ +\frac12\omega_1\big(\|\mathbf x^{(1)}\|^2-N\big) \\
+ &\qquad\qquad\qquad+\frac12\omega_2\big(\|\mathbf x^{(2)}\|^2-N\big)
+ \end{aligned}
\\
- \nabla H(\mathbf x,\pmb\omega)
- &=\partial H(\mathbf x)+\begin{bmatrix}
- \omega^{(1)}\mathbf x^{(1)} \\
- \omega^{(2)}\mathbf x^{(2)}
+ &\nabla H(\mathbf x,\pmb\omega)
+ =
+ \begin{bmatrix}
+ \partial_1H_1(\mathbf x^{(1)})-\epsilon\mathbf x^{(2)}+\omega_1\mathbf x^{(1)} \\
+ \partial_2H_2(\mathbf x^{(2)})-\epsilon\mathbf x^{(1)}+\omega_2\mathbf x^{(2)}
\end{bmatrix}
\\
- \operatorname{Hess}H(\mathbf x,\pmb\omega)
- &=\partial\partial H(\mathbf x)+\begin{bmatrix}
- \omega^{(1)}I&0 \\
- 0&\omega^{(2)}I
+ &\begin{aligned}
+ &\operatorname{Hess}H(\mathbf x,\pmb\omega) \\
+ &\quad=
+ \begin{bmatrix}
+ \partial_1\partial_1H_1(\mathbf x^{(1)})+\omega^{(1)}I&-\epsilon I \\
+ -\epsilon I&\partial_2\partial_2H_2(\mathbf x^{(2)})+\omega^{(2)}I
\end{bmatrix}
+ \end{aligned}
\end{align}
Like in the spherical model, fixing the trace of the Hessian to $\mu$ is
equivalent to a constraint on the Lagrange multipliers. However, in this case
-it corresponds to $\mu=\omega^{(1)}+\omega^{(2)}$, and therefore they are not
+it corresponds to $\mu=\omega_1+\omega_2$, and therefore they are not
uniquely fixed by the trace.
Since the energy in the multispherical models is Gaussian, the properties of
the matrix $\partial\partial H$ are again independent of the energy and
gradient. This means that the form of the Hessian is parameterized solely by
-the values of the Lagrange multipliers $\omega^{(1)}$ and $\omega^{(2)}$, just
+the values of the Lagrange multipliers $\omega_1$ and $\omega_2$, just
as $\mu=\omega$ alone parameterized the Hessian in the spherical spin glasses.
Unlike that case, however, the Hessian takes different shapes with different
spectral widths depending on their precise combination. In
@@ -1002,19 +1066,20 @@ stationary points are typically uncorrelated and therefore $X=\hat X=0$ and the
overlaps $Q$ between eigenvectors are only nonzero when in the same replica.
The result has the schematic form of \eqref{eq:spherical.complexity}, but with
different effective actions depending now on overlaps inside each of the two
-spheres and between the two spheres. These are
+spheres and between the two spheres. The effective action for the traditional
+complexity of the multispherical spin glass is
\begin{widetext}
\begin{equation}
\begin{aligned}
- &\mathcal S_\mathrm{MSG}(\hat\beta,C^{11},R^{11},D^{11},G^{11},C^{22},R^{22},D^{22},G^{22},C^{12},R^{12},R^{21},D^{12},G^{12}
- \mid E,\omega_1,\omega_2)= \\
- &\quad
- \mathcal S_\mathrm{SSG}(\hat\beta,C^{11},R^{11},D^{11},G^{11}\mid E_1,\omega_1)
+ &\mathcal S_\mathrm{MSG}(\hat\beta,C^{11},R^{11},D^{11},G^{11},C^{22},R^{22},D^{22},G^{22},C^{12},R^{12},R^{21},D^{12},G^{12},G^{21}
+ \mid E,\omega_1,\omega_2)= \hat\beta(E-E_1-E_2-\epsilon c_d^{12})\\
+ &\quad-\epsilon(r^{12}_d+r^{21}_d+g^{12}_d+g^{21}_d)
+ +\mathcal S_\mathrm{SSG}(\hat\beta,C^{11},R^{11},D^{11},G^{11}\mid E_1,\omega_1)
+\mathcal S_\mathrm{SSG}(\hat\beta,C^{22},R^{22},D^{22},G^{22}\mid E_2,\omega_2)
- -\epsilon(r^{12}_d+r^{21}_d)+\hat\beta(E-E_1-E_2-\epsilon c_d^{12}) \\
+ \\
&\quad
- +\frac12\log\det\left(
- I+
+ +\lim_{n\to0}\frac1n\left\{\frac12\log\det\left(
+ I-
\begin{bmatrix}C^{11}&iR^{11}\\iR^{11}&D^{11}\end{bmatrix}^{-1}
\begin{bmatrix}
C^{12} & iR^{12} \\ iR^{21} & D^{12}
@@ -1024,25 +1089,33 @@ spheres and between the two spheres. These are
C^{12} & iR^{21} \\ iR^{21} & D^{12}
\end{bmatrix}
\right)
- -\log\det(I+(G^{11}G^{22})^{-1}G^{12}G^{21})
+ -\log\det(I-(G^{11}G^{22})^{-1}G^{12}G^{21})\right\}
\end{aligned}
\end{equation}
-and
+which is the sum of two effective actions for the spherical spin glass
+associated with each individual sphere, and some coupling terms. The order
+parameters are defined the same as in the spherical spin glasses, but now with
+raised indices to indicate whether the vectors come from one or the other
+spherical subspace. The effective action for the eigenvalue-dependent part of
+the complexity is likewise given by
\begin{equation}
\begin{aligned}
&\mathcal U_\mathrm{MSG}(\hat q,\hat\lambda,Q^{11},Q^{22},Q^{12}\mid\lambda^*,\omega_1,\omega_2,\beta) \\
- &\sum_a^n\left[\hat q_a(Q^{11}_{aa}+Q^{22}_{aa}-1)-\beta(\omega_1Q^{11}_{aa}+\omega_2Q^{22}_{aa}+2\epsilon Q^{12}_{aa})\right]
- +\hat\lambda(\omega_1Q^{11}_{11}+\omega_2Q^{22}_{11}+2\epsilon Q^{12}_{11}) \\
- &+\sum_{i=1,2}f_i''(1)\left[\beta^2\sum_{ab}^n(Q^{ii}_{ab})^2-2\beta\hat\lambda\sum_a^n(Q^{ii}_{1a})^2+\hat\lambda^2(Q^{ii}_{11})^2\right]
+ &\quad=\lim_{m\to0}\bigg\{\sum_{\alpha=1}^m\left[\hat q^\alpha(Q^{11,\alpha\alpha}+Q^{22,\alpha\alpha}-1)-\beta(\omega_1Q^{11,\alpha\alpha}+\omega_2Q^{22,\alpha\alpha}-2\epsilon Q^{12,\alpha\alpha})\right]
+ +\hat\lambda(\omega_1Q^{11,11}+\omega_2Q^{22,11}-2\epsilon Q^{12,11}) \\
+ &\qquad\qquad+\sum_{i=1,2}f_i''(1)\left[\beta^2\sum_{\alpha\gamma}^m(Q^{ii,\alpha\gamma})^2+2\beta\hat\lambda\sum_\alpha^m(Q^{ii,1\alpha})^2+\hat\lambda^2(Q^{ii,11})^2\right]
+\frac12\log\det\begin{bmatrix}
Q^{11}&Q^{12}\\
Q^{12}&Q^{22}
\end{bmatrix}
+ \bigg\}
\end{aligned}
\end{equation}
\end{widetext}
-where again the problem of fixing marginality has completely separated from
-that of the complexity. The biggest change between this problem and the
+The new variables $\hat q^\alpha$ are Lagrange multipliers introduced
+to enforce the constraint that $Q^{11,\alpha\alpha}+Q^{22,\alpha\alpha}=1$.
+
+The biggest change between this problem and the
spherical one is that now the spherical constraint in the tangent space at each
stationary point gives the constraint on the order parameters
$q^{11}_d+q^{22}_d=1$. Therefore, the diagonal of the $Q$ matrices cannot be
@@ -1058,11 +1131,14 @@ form \eqref{eq:Q.structure}, but with the diagonal not necessarily equal to 1, s
\tilde q^{ij}_0 & q^{ij}_0 & q^{ij}_0 & \cdots & q^{ij}_d
\end{bmatrix}
\end{equation}
-
+This requires us to introduce two new order parameters per pair $(i,j)$. When
+this ansatz is inserted into the expression for the effective action and the
+limit of $m\to0$ is taken, we find
\begin{widetext}
\begin{equation}
\begin{aligned}
- &\sum_{i=1,2}f_i''(1)\left[
+ &\mathcal U_\mathrm{MSG}(\hat q,\hat{\tilde q},\hat\lambda,\tilde q_d^{11},\tilde q_0^{11},q_d^{11},q_0^{11},\tilde q_d^{22},\tilde q_0^{22},q_d^{22},q_0^{22},\tilde q_d^{12},\tilde q_0^{12},q_d^{12},q_0^{12}\mid\lambda^*,\omega_1,\omega_2,\beta) \\
+ &=\sum_{i=1,2}\left\{f_i''(1)\left[
\beta^2\left(
(\tilde q^{ii}_d)^2
-(q^{ii}_d)^2
@@ -1074,14 +1150,9 @@ form \eqref{eq:Q.structure}, but with the diagonal not necessarily equal to 1, s
\right)
+\hat\lambda^2(\tilde q^{ii}_d)^2
\right]
- +\hat\lambda\left(
- \tilde q^{11}_d\omega_1+\tilde q^{22}_d\omega_2+2\tilde q^{12}_d
- \right) \\
- &-\beta\left(
- (\tilde q^{11}_d-q^{11}_d)\omega_1
- +(\tilde q^{22}_d-q^{22}_d)\omega_2
- -2\epsilon(\tilde q^{12}_d-q^{12}_d)
- \right) \\
+ +\hat\lambda\tilde q^{ii}_d\omega_i
+ -\beta(\tilde q^{ii}_d-q^{ii}_d)\omega_i
+ \right\} \\
&+\frac12\log\bigg[
\left(
2q^{12}_0\tilde q^{12}_0-\tilde q^{12}_0(\tilde q^{12}_d+q^{12}_d)
@@ -1103,6 +1174,9 @@ form \eqref{eq:Q.structure}, but with the diagonal not necessarily equal to 1, s
\bigg]
\\
&-\log\left[(q^{11}_d-q^{11}_0)(q^{22}_d-q^{22}_0)-(q^{12}_d-q^{12}_0)^2\right]
+ -2\epsilon\big[\hat\lambda\tilde q^{12}_d
+ -\beta(\tilde q^{12}_d-q^{12}_d)\big]
+ +\hat q(q^{11}_d+q^{22}_d-1)+\hat{\tilde q}(\tilde q^{11}_d+\tilde q^{22}_d-1)
\end{aligned}
\end{equation}
\end{widetext}
@@ -1110,10 +1184,10 @@ To make the limit to zero temperature, we once again need an ansatz for the
asymptotic behavior of the overlaps. These take the form
$q^{ij}_0=q^{ij}_d-y^{ij}_0\beta^{-1}-z^{ij}_0\beta^{-2}$, with the same for
the tilde variables. Notice that in this case, the asymptotic behavior of the
-off diagonal elements is to approach the value of the diagonal rather than one.
+off-diagonal elements is to approach the value of the diagonal rather than to approach one.
We also require $\tilde q^{ij}_d=q^{ij}_d-\tilde y^{ij}_d\beta^{-1}-\tilde
-z^{ij}_d\beta^{-2}$, i.e., that the tilde diagonal term also approaches the
-same diagonal value.
+z^{ij}_d\beta^{-2}$, i.e., that the tilde diagonal terms also approache the
+same diagonal value as the untilde terms, but with potentially different rates.
As before, in order for the volume term to stay finite, there are necessary
constraints on the values $y$. These are
@@ -1127,27 +1201,59 @@ $y$s for the off-diagonal elements to be equal, as in the GOE case. Here, since
the diagonal elements are not necessarily equal, we have a more general
relationship.
+When the $\beta$-dependence of the $q$ variables is inserted into the effective
+action and the limit taken, we find an expression that is too large to report
+here. However, it can be extremized over all of the variables in the problem
+just as in the previous examples to find the values of the Lagrange multipliers
+$\omega_1$ and $\omega_2$ corresponding to marginal minima.
+Fig.~\ref{fig:msg.marg}(a) shows examples of the marginal $\omega_1$ and
+$\omega_2$ for a variety of couplings $\epsilon$ when the covariances of the
+energy on the two spherical subspaces are such that $1=f_1''(1)=f_2''(1)$.
+Fig.~\ref{fig:msg.marg}(b) shows the Hessian spectra associated with some
+specific pairs $(\omega_1,\omega_2)$. When $\epsilon=0$ and the two spheres are
+uncoupled, we find the result for two independent spherical spin glasses. If
+either $\omega_1=2\sqrt{f''(1)}=2$ or $\omega_2=2\sqrt{f''(1)}=2$ and the other
+Lagrange multiplier is larger than 2, then we have a marginal minimum in the
+uncoupled case, made up of the Cartesian product of a marginal minimum on one
+subspace and a stable minimum on the other.
+
\begin{figure}
\includegraphics{figs/msg_marg_legend.pdf}
+ \vspace{1em}
+
\includegraphics{figs/msg_marg_params.pdf}
\hfill
\includegraphics{figs/msg_marg_spectra.pdf}
+ \vspace{1em}
+
+ \includegraphics{figs/msg_marg_complexity.pdf}
+
\caption{
- \textsc{Left}: Values of the Lagrange multipliers $\omega_1$ and $\omega_2$
+ Properties of marginal minima in the multispherical model.
+ \textbf{(a)}~Values of the Lagrange multipliers $\omega_1$ and $\omega_2$
corresponding to a marginal spectrum for multispherical spin glasses with
$\sigma_1^2=f_1''(1)=1$, $\sigma_2^2=f_2''(1)=1$, and various $\epsilon$.
- \textsc{Right}: Spectra corresponding to the parameters $\omega_1$ and
+ \textbf{(b)}~Spectra corresponding to the parameters $\omega_1$ and
$\omega_2$ marked by the circles on the lefthand plot.
+ \textbf{(c)}~The complexity of marginal minima in a multispherical model with
+ $f_i(q)=q^{p_i}/[p_i(p_i-1)]$ for $p_1=3$ and $p_2=4$ for a variety of
+ $\epsilon$. Since $f_1''(1)=f_2''(1)=1$, the marginal values correspond
+ precisely to those in (a--b).
} \label{fig:msg.marg}
\end{figure}
-Fig.~\ref{fig:msg.marg} shows the examples of the Lagrange multipliers
-necessary for marginality in a set of multispherical spin glasses at various
-couplings $\epsilon$, along with some of the corresponding spectra. As
-expected, the method correctly picks out values of the Lagrange multipliers
-that result in marginal spectra.
+Fig.~\ref{fig:msg.marg}(c) shows the complexity of marginal minima in an
+example where both $H_1$ and $H_2$ correspond to pure $p$-spin models, with
+$f_1(q)=\frac16q^3$ and $f_2(q)=\frac1{12}q^4$. Despite having different
+covariance functions, these both satisfy $1=f_1''(1)=f_2''(1)$ and therefore
+have marginal minima for Lagrange multipliers that satisfy the relationships in
+Fig.\ref{fig:msg.marg}(a). In the uncoupled system with $\epsilon=0$, the most
+common type of marginal stationary point consists of independently marginal
+stationary points in the two subsystems, with $\omega_1=\omega_2=2$. As
+$\epsilon$ is increased, the most common type of marginal minimum drifts toward
+points with $\omega_1>\omega_2$.
Multispherical spin glasses may be an interesting platform for testing ideas
about which among the possible marginal minima actually attract the dynamics,
@@ -1168,12 +1274,12 @@ In this subsection we consider perhaps the simplest example of a non-Gaussian
landscape: the problem of random nonlinear least squares optimization. Though,
for reasons we will see it is easier to make predictions for random nonlinear
\emph{most} squares, i.e., the problem of maximizing the sum of squared terms.
-We also take a spherical problem with $\mathbf x\in S^{N-1}$, and consider a set
+We also take a spherical configuration space with $\mathbf x\in S^{N-1}$ and $0=g(\mathbf x)=\frac12(\|\mathbf x\|^2-N)$ as in the spherical spin glasses, and consider a set
of $M$ random functions $V_k:\mathbf S^{N-1}\to\mathbb R$ that are centered Gaussians with covariance
\begin{equation}
- \overline{V_i(\mathbf x)V_j(\mathbf x')}=\delta_{ij}f\left(\frac{\mathbf x^T\mathbf x'}N\right)
+ \overline{V_i(\mathbf x)V_j(\mathbf x')}=\delta_{ij}f\left(\frac{\mathbf x\cdot\mathbf x'}N\right)
\end{equation}
-The energy or cost function is minus the sum of squares of the $V_k$, or
+The energy is minus the sum of squares of the $V_k$, or
\begin{equation}
H(\mathbf x)=-\frac12\sum_{k=1}^MV_k(\mathbf x)^2
\end{equation}