Progress.

- Cleaned up some redundancies and made notation consistent. - Wrote a conclusion and smoothed out the introduction. - Added reference to the companion paper.
author: Jaron Kent-Dobias <jaron@kent-dobias.com> 2024-06-24 22:46:57 +0200
committer: Jaron Kent-Dobias <jaron@kent-dobias.com> 2024-06-24 22:46:57 +0200
commit: b43cb92da15447905f61a6ceca9e3ab32309771d (patch)
tree: 98597b21f73972eceed28b9f45b084e38c29a94e /marginal.tex
parent: f4a853cf1f6d8de4d455e80e42d30913f664c7cc (diff)
download: marginal-b43cb92da15447905f61a6ceca9e3ab32309771d.tar.gz
marginal-b43cb92da15447905f61a6ceca9e3ab32309771d.tar.bz2
marginal-b43cb92da15447905f61a6ceca9e3ab32309771d.zip
1 files changed, 134 insertions, 90 deletions
diff --git a/marginal.tex b/marginal.tex
index 6451147..3a94ca0 100644
--- a/marginal.tex
+++ b/marginal.tex
@@ -8,11 +8,11 @@
 \usepackage[dvipsnames]{xcolor}
 \usepackage[
   colorlinks=true,
-  urlcolor=Black,
-  citecolor=Black,
-  filecolor=Black,
-  linkcolor=Black
-]{hyperref} % ref and cite links with pretty colors
+  urlcolor=BlueViolet,
+  citecolor=BlueViolet,
+  filecolor=BlueViolet,
+  linkcolor=BlueViolet
+]{hyperref}
 
 
 \begin{document}
@@ -47,58 +47,72 @@
 \section{Introduction}
 
 Systems with rugged landscapes are important across many disciplines, from the
-physics to glasses and spin-glasses to the statistical inference problems. The
+physics to glasses and spin-glasses to statistical inference problems. The
 behavior of these systems is best understood when equilibrium or optimal
-solutions are studied and averages can be taken statically over all possible
-configurations. However, such systems are also infamous for their tendency to
-defy equilibrium and optimal expectations in practice, due to the presence of
-dynamic transitions or crossovers that leave physical or algorithmic dynamics
-stuck exploring only a subset of configurations.
+solutions are studied and weighted averages can be taken statically over all
+possible configurations. However, such systems are also infamous for their
+tendency to defy equilibrium and optimal expectations in practice, due to the
+presence of dynamic transitions or crossovers that leave physical or
+algorithmic dynamics stuck exploring only a subset of configurations.
 
-In some simple models of such landscapes, it was recently found that marginal
+In some simple models of rugged landscapes, it was recently found that marginal
 minima are significant as the attractors of gradient descent dynamics
 \cite{Folena_2020_Rethinking, Folena_2023_On}. This extends to more novel
-algorithms, like message passing \cite{}. 
-While it is still not known how to predict which marginal minima will be
-attractors, this ubiquity of behavior suggests that cartography of marginal
-minima is a useful step in bounding out-of-equilibrium dynamical behavior.
-
-In the traditional methods for analyzing the geometric structure of rugged
-landscapes, it is not necessarily straightforward to condition an analysis on
-the marginality of minima. Using the method of a Legendre transformation of the
-Parisi parameter corresponding to a set of real replicas, one can force the
-result to be marginal by restricting the value of that parameter, but this
+algorithms, like approximate message passing
+\cite{Kent-Dobias_2024_Algorithm-independent}. While it is still not known how
+to predict which marginal minima will be attractors, this ubiquity of behavior
+suggests that understanding the distribution of marginal minima is a useful
+step in bounding out-of-equilibrium dynamical behavior.
+
+It is not straightforward to condition on the marginality of minima using the
+traditional methods for analyzing the distribution of minima in rugged
+landscapes. Using the method of a Legendre transformation of the Parisi
+parameter corresponding to a set of real replicas, one can force the result to
+correspond with marginal minima by tuning the value of that parameter, but this
 results in only the marginal minima at the energy level at which they are the
-majority of stationary points \cite{Monasson_1995_Structural}. It is now
-understood that out-of-equilibrium dynamics usually goes to marginal minima at
-other energy levels \cite{Folena_2023_On}.
-
-The alternative, used to great success in the spherical models, is to start by
-making a detailing understanding of the Hessian matrix at stationary points.
-Then, one can condition the analysis on whatever properties of the Hessian are
-necessary to lead to marginal minima. This strategy is so successful in the
-spherical models because it is very straightforward to implement: a natural
-parameter in the analysis of these models linearly shifts the spectrum of the
-Hessian, and so fixing this parameter by whatever means naturally allows one to
-require that the Hessian spectrum have a pseudogap.
-Unfortunately this strategy is less straightforward to generalize. Many models
-of interest, especially in inference problems, have Hessian statistics that are
-poorly understood.
+majority of stationary points, commonly known as the threshold energy
+\cite{Monasson_1995_Structural}. It is now understood that out-of-equilibrium
+dynamics typically goes to marginal minima at other energy levels
+\cite{Folena_2023_On}.
+
+The alternative approach, used to great success in the spherical models, is to
+start by making a detailed understanding of the Hessian matrix at stationary
+points. Then, one can condition the analysis on whatever properties of the
+Hessian are necessary to lead to marginal minima. This strategy is so
+successful in the spherical models because it is straightforward to implement.
+First, the shape of the Hessian's spectrum is independent of energy and even
+whether one sits at a stationary point or not. This is a property of models
+whose energy is a Gaussian random variable \cite{Bray_2007_Statistics}.
+Furthermore, a natural parameter in the analysis of these models linearly
+shifts the spectrum of the Hessian. Therefore, tuning this parameter to a
+specific constant value allows one to require that the Hessian spectrum have a
+pseudogap, and therefore that the associated minima be marginal. Unfortunately
+this strategy is less straightforward to generalize. Many models of interest,
+especially in inference problems, have Hessian statistics that are poorly
+understood. This is especially true for the statistics of the Hessian
+conditioned to lie at stationary points, which is necessary to understand in
+models whose energy is non-Gaussian.
 
 Here, we introduce a generic method for conditioning the statistics of
 stationary points on their marginality. The technique makes use of a novel way
-to condition an integral over parameters to select only those that result in a
-certain value of the smallest eigenvalue of a matrix that is a function of
-those parameters. By requiring that the smallest eigenvalue of the Hessian at
-stationary points be zero, we restrict to marginal minima, either those with a
-pseudogap in their bulk spectrum or those with outlying eigenvectors. We
-provide a heuristic to distinguish these two cases. We demonstrate the method
-on the spherical models, where it is unnecessary but instructive, and on
-extensions of the spherical models with non-GOE Hessians where the technique is
-more useful.
+to condition an integration measure to select only configurations that result
+in a certain value of the smallest eigenvalue of a matrix. By requiring that
+the smallest eigenvalue of the Hessian at stationary points be zero, and
+further looking for a sign that the zero eigenvalue lies at the edge of a
+continuous spectrum, we enforce the condition that the spectrum has a
+pseudogap, and is therefore marginal. We demonstrate the method on the
+spherical models, where it is unnecessary but instructive, and on extensions of
+the spherical models with non-GOE Hessians where the technique is more useful.
 
 \section{Conditioning on the smallest eigenvalue}
 
+In this section, we introduce a general method for conditioning a measure on
+the smallest eigenvalue of some matrix that depends on it. In Section
+\ref{sec:shifted.GOE} we show how this works in perhaps the simplest example of
+GOE random matrices with a shifted diagonal. In the final subsection we
+describe how to extend this method to condition on the presence of a pseudogap
+at the bottom on the spectrum.
+
 \subsection{The general method}
 
 An arbitrary function $g$ of the minimum eigenvalue of a matrix $A$ can be
@@ -377,7 +391,20 @@ that $\hat\lambda=0$.
 
 \section{Marginal complexity in random landscapes}
 
+The methods of the previous section can be used in diverse settings. However,
+we are interested in applying them to study stationary points in random
+landscapes whose Hessian spectrum has a pseudogap -- that is, that are
+marginal. In Section \ref{sec:marginal.kac-rice} we define the marginal
+complexity using the tools of the previous section. In Section
+\ref{sec:general.features} we review several general features in a physicists'
+approach to computing the marginal complexity. In Section
+\ref{sec:superspace_kac-rice} we introduce a representation of the marginal
+complexity in terms of an integral over a superspace, which condenses the
+notation and the resulting calculation and which we will use in one of our
+examples in the next section.
+
 \subsection{Marginal complexity from Kac--Rice}
+\label{sec:marginal.kac-rice}
 
 The situation in the study of random landscapes is often as follows: an
 ensemble of smooth functions $H:\mathbb R^N\to\mathbb R$ define random
@@ -444,7 +471,7 @@ define the complexity of points with a specific energy, stability, and minimum e
 \end{equation}
 In practice, this can be computed by introducing replicas to treat the
 logarithm ($\log x=\lim_{n\to0}\frac\partial{\partial n}x^n$) and replicating
-again to treat each of the normalizations in the numerator. This leads to the expression
+again to treat each of the normalizations in the numerator ($x^{-1}=\lim_{m\to-1}x^m$). This leads to the expression
 \begin{equation} \label{eq:min.complexity.expanded}
   \begin{aligned}
     \Sigma_{\lambda^*}(E,\mu)
@@ -471,6 +498,7 @@ Finally, the marginal complexity is defined by evaluating the complexity conditi
 \end{equation}
 
 \subsection{General features of saddle point computation}
+\label{sec:general.features}
 
 Several elements of the computation of the marginal complexity, and indeed the
 ordinary dominant complexity, follow from the formulae of the above section in
@@ -536,6 +564,7 @@ that the effect of fixing the trace is independent of the disorder and only
 depends on properties of the constraint manifold.
 
 \subsection{Superspace representation}
+\label{sec:superspace_kac-rice}
 
 The ordinary Kac--Rice calculation involves many moving parts, and this method
 for incorporating marginality adds even more. It is therefore convenient to
@@ -603,7 +632,17 @@ of the calculation, terms involving the superspace must be expanded.
 
 \section{Examples}
 
+In this section we present analysis of marginal complexity in three random
+landscapes. In Section \ref{sec:ex.spherical} we apply the methods described
+above to the spherical spin glasses, which reveals some general aspects of the
+calculation. Since the spherical spin glasses are Gaussian and have identical
+GOE spectra at each stationary point, the approach introduced here is overkill.
+In Section \ref{sec:multispherical} we apply the methods to a multispherical
+spin glass, which is still Gaussian but has a non-GOE spectrum that can vary
+between stationary points. Finally, in Section \ref{sec:least.squares} we analyze a model of the sum of squares of random functions, which is non-Gaussian and whose Hessian statistics depend on the conditioning of the energy and gradient.
+
 \subsection{Spherical spin glasses}
+\label{sec:ex.spherical}
 
 The spherical spin glasses are a family of models that encompass every
 isotropic Gaussian field on the hypersphere defined by all $\mathbf x\in\mathbb R^N$ such that $0=\mathbf x^T\mathbf x-N$. One can consider the models as defined by centered Gaussian functions $H$ such that the covariance between two points in the configuration space is
@@ -1149,56 +1188,38 @@ systems is well-established. Here, we introduce a novel extension of the
 traditional approach to incorporate the marginality condition.
 Consider supervectors in the $\mathbb R^{N|4}$ superspace of the form
 \begin{equation}
-  \pmb\phi_{a\alpha}(1,2)
+  \pmb\phi_a^\alpha(1,2)
   =\mathbf x_a
   +\bar\theta_1\pmb\eta_a+\bar{\pmb\eta}_a\theta_1
   +i\hat{\mathbf x}_a\bar\theta_1\theta_1
-  +\mathbf s_{a\alpha}(\bar\theta_1\theta_2+\bar\theta_2\theta_1)
+  +\mathbf s_a^\alpha(\bar\theta_1\theta_2+\bar\theta_2\theta_1)
 \end{equation}
 The traditional complexity problem, outlined in the appendix
 \ref{sec:dominant.complexity}, involves a supervector without the last term.
 \begin{widetext}
-  The replicated number of stationary points conditioned on energy $E$, trace $\mu$, and minimum eigenvalue $\lambda^*$ is then given by
+  Using the notation of Section \ref{sec:superspace_kac-rice}, the replicated
+  number of stationary points conditioned on energy $E$, trace $\mu$, and
+  minimum eigenvalue $\lambda^*$ is then given by
 \begin{equation}
   \begin{aligned}
     \mathcal N(E,\mu,\lambda^*)^n
-    &=\int\prod_{a=1}^n\lim_{m_a\to0}\prod_{\alpha=1}^{m_a}d\pmb\phi_{a\alpha}
+    &=\int\prod_{a=1}^n\lim_{m_a\to0}\prod_{\alpha=1}^{m_a}d\pmb\phi_a^\alpha
     \exp\left\{
-      \delta_{\alpha1}N(\hat\beta_aE+\hat\lambda_a\lambda^*)
-      +\int d1\,d2\,B_{a\alpha}(1,2)\left[H(\pmb\phi_{a\alpha})+\frac12\mu(\|\pmb\phi_{a\alpha}\|^2-N)\right]
+      \delta^{\alpha1}N(\hat\beta_aE+\hat\lambda_a\lambda^*)
+      -\frac12\int d1\,d2\,B_a^\alpha(1,2)\left[\sum_{k=1}^MV_k(\pmb\phi_a^\alpha)^2
+      -\mu(\|\pmb\phi_a^\alpha\|^2-N)\right]
     \right\}
   \end{aligned}
 \end{equation}
-where we use the compact notation $d1=d\theta_1\,d\bar\theta_1$ for the
-measures associated with the Grassmann directions. Here we have also defined
-\begin{equation}
-  B_{a\alpha}(1,2)=\delta_{\alpha1}\bar\theta_2\theta_2
-        (1-\hat\beta_a\bar\theta_1\theta_1)
-        -\delta_{\alpha1}\hat\lambda_a-\beta
-\end{equation}
-which encodes various aspects of the complexity problem, and the measure
-\begin{align}
-  d\pmb\phi_{a\alpha}
-  =d\mathbf x_a\,\delta(\|\mathbf x_a\|^2-N)\,\frac{d\hat{\mathbf x}_a}{(2\pi)^N}\,d\pmb\eta_a\,d\bar{\pmb\eta}_a\,
-  d\mathbf s_{a\alpha}\,\delta(\|\mathbf s_{a\alpha}\|^2-N)\,
-  \delta(\mathbf x_a^T\mathbf s_{a\alpha})
-\end{align}
-encoding the measures of all the superfield's constituent variables. Expanding
-functions of the superfield in the coordinates $\theta$ and performing the
-integrals, this expression is equivalent to that of the replicated Kac--Rice
-integrand \eqref{eq:min.complexity.expanded} with the substitutions of the
-Dirac $\delta$ functions of \eqref{eq:delta.grad}, \eqref{eq:delta.energy}, and
-\eqref{eq:delta.eigen}.
-
 The first step to evaluate this expression is to linearize the dependence on the random functions $V$. This is accomplished by inserting into the integral a Dirac $\delta$ function fixing the value of the energy for each replica, or
 \begin{equation}
   \delta\big(
-    V^k(\pmb\phi_{a\alpha}(1,2))-v_{a\alpha}^k(1,2)
+    V_k(\pmb\phi_a^\alpha(1,2))-v_{ka}^\alpha(1,2)
   \big)
   =
-  \int\prod_{a\alpha k}d\hat v_{a\alpha}^k\exp\left[
-    i\int d1\,d2\,\hat v_{a\alpha}^k(1,2)
-    \big(V^k(\pmb\phi_{a\alpha}(1,2))-v_{a\alpha}^k(1,2)\big)
+  \int d\hat v_{ka}^\alpha\exp\left[
+    i\int d1\,d2\,\hat v_{ka}^\alpha(1,2)
+    \big(V_k(\pmb\phi_a^\alpha(1,2))-v_{ka}^\alpha(1,2)\big)
   \right]
 \end{equation}
 where we have introduced auxiliary fields $\hat v$. With this inserted into the
@@ -1208,26 +1229,26 @@ the Fourier representation of the Dirac $\delta$ function. This term is linear i
 \begin{equation}
   \overline{
     \exp\left[
-      i\sum_{a\alpha k}\int d1\,d2\,\hat v_{a\alpha}^k(1,2)
-      V^k(\pmb\phi_{a\alpha}(1,2))
+      i\sum_{ka\alpha}\int d1\,d2\,\hat v_{ka}^\alpha(1,2)
+      V_k(\pmb\phi_a^\alpha(1,2))
     \right]
   }
   =
   -\frac N2\sum_{ab}^n\sum_{\alpha\gamma}^{m_a}\sum_k^{\alpha N}\int d1\,d2\,d3\,d4\,
-  \hat v_{a\alpha}^k(1,2)f\big(\pmb\phi_{a\alpha}(1,2)^T\pmb\phi_{b\gamma}(3,4)\big)\hat v_{b\gamma}^k(3,4)
+  \hat v_{ka}^\alpha(1,2)f\big(\pmb\phi_a^\alpha(1,2)^T\pmb\phi_b^\gamma(3,4)\big)\hat v_{kb}^\gamma(3,4)
 \end{equation}
 The entire integrand is now quadratic in the $v$ and $\hat v$ with the kernel
 \begin{equation}
   \begin{bmatrix}
-    B_{a\alpha}(1,2)\delta(1,3)\delta(2,4)\delta_{ab}\delta_{\alpha\gamma} & i\delta(1,3)\,\delta(2,4) \delta_{ab}\delta_{\alpha\gamma}\\
-    i\delta(1,3)\,\delta(2,4) \delta_{ab}\delta_{\alpha\gamma}& f\big(\pmb\phi_{a\alpha}(1,2)^T\pmb\phi_{b\gamma}(3,4)\big)
+    B_a^\alpha(1,2)\delta(1,3)\delta(2,4)\delta_{ab}\delta^{\alpha\gamma} & i\delta(1,3)\,\delta(2,4) \delta_{ab}\delta^{\alpha\gamma}\\
+    i\delta(1,3)\,\delta(2,4) \delta_{ab}\delta^{\alpha\gamma}& f\big(\pmb\phi_a^\alpha(1,2)^T\pmb\phi_b^\gamma(3,4)\big)
   \end{bmatrix}
 \end{equation}
 The integration over the $v$ and $\hat v$ results in a term in the effective action of the form
 \begin{equation}
   -\frac M2\log\operatorname{sdet}\left(
-    \delta(1,3)\,\delta(2,4) \delta_{ab}\delta_{\alpha\gamma}
-    +B_{a\alpha}(1,2)f\big(\pmb\phi_{a\alpha}(1,2)^T\pmb\phi_{b\gamma}(3,4)\big)
+    \delta(1,3)\,\delta(2,4) \delta_{ab}\delta^{\alpha\gamma}
+    +B_a^\alpha(1,2)f\big(\pmb\phi_a^\alpha(1,2)^T\pmb\phi_b^\gamma(3,4)\big)
   \right)
 \end{equation}
 When expanded, this supermatrix is constructed of the scalar products of the
@@ -1240,10 +1261,9 @@ Up to this point, the expressions above are general and independent of a given
 ansatz. However, we expect that the order parameters $X$ and $\hat X$ are zero,
 since this case is isotropic. Applying this ansatz here avoids a dramatically
 more complicated expression for the effective action found in the case with
-arbitrary $X$ and $\hat X$. We also will apply the ansatz that $Q_{a\alpha
-b\gamma}$ is zero for $a\neq b$, which is equivalent to assuming that the soft
+arbitrary $X$ and $\hat X$. We also will apply the ansatz that $Q_{ab}^{\alpha\gamma}$ is zero for $a\neq b$, which is equivalent to assuming that the soft
 directions of typical pairs of stationary points are uncorrelated, and further
-that $Q_{\alpha\gamma}=Q_{a\alpha a\gamma}$ independently of the index $a$,
+that $Q^{\alpha\gamma}=Q_{aa}^{\alpha\gamma}$ independently of the index $a$,
 implying that correlations in the tangent space of typical stationary points
 are the same.
 
@@ -1252,7 +1272,7 @@ Given these simplifying forms of the ansatz, taking the superdeterminant yields
   \begin{aligned}
     \log\det\left\{
       \left[
-        f'(C)\odot D-\hat\beta I+\left(R^{\circ2}-G^{\circ2}+I\sum_{\alpha\gamma}2(\delta_{\alpha1}\hat\lambda+\beta)(\delta_{\gamma1}\hat\lambda+\beta)Q_{\alpha\gamma}^2\right)\odot f''(C)
+        f'(C)\odot D-\hat\beta I+\left(R^{\circ2}-G^{\circ2}+I\sum_{\alpha\gamma}2(\delta^{\alpha1}\hat\lambda+\beta)(\delta^{\gamma1}\hat\lambda+\beta)Q_{\alpha\gamma}^2\right)\odot f''(C)
       \right]f(C)
       +(I-R\odot f'(C))^2
     \right\} \\
@@ -1335,7 +1355,31 @@ taking the zero-temperature limit, we find
   }
 \end{figure}
 
-\section{Conclusion}
+\section{Conclusions}
+
+We have introduced a method for conditioning complexity on the marginality of
+stationary points. This method is in principal completely general, and permits
+this conditioning without first needing to understand the entire Hessian
+statistics. We used our approach to study the marginal complexity in three
+different models of random landscapes, showing that the method works and can be
+applied to models whose marginal complexity was not previously known. In our
+companion paper, we further show that the marginal complexity in the third
+model of random nonlinear least squares can be used to effectively bound
+algorithmic performance \cite{Kent-Dobias_2024_Algorithm-independent}.
+
+There are some limitations to the approach we have largely relied in this
+paper. The main limitation is our restriction to signalless landscapes, where
+there is no symmetry-breaking favored direction. This allowed us to neglect the
+presence of stationary points with isolated eigenvalues as atypical, and
+therefore apply the marginal conditioning using a variational principle.
+However, most models of interest in inference have a nonzero signal strength
+and therefore often have typical stationary points with an isolated eigenvalue.
+As we described earlier, marginal complexity can still be analyzed in these
+systems by tuning the shift $\mu$ until the large-deviation principle breaks
+down and an imaginary part of the complexity appears. However, this is an
+inconvenient measure. It's possible that a variational approach can be
+preserved by treating the direction toward and the directions orthogonal to the
+signal differently. This problem merits further research.
 
 \begin{acknowledgements}
   JK-D is supported by a \textsc{DynSysMath} Specific Initiative of the INFN.
author	Jaron Kent-Dobias <jaron@kent-dobias.com>	2024-06-24 22:46:57 +0200
committer	Jaron Kent-Dobias <jaron@kent-dobias.com>	2024-06-24 22:46:57 +0200
commit	b43cb92da15447905f61a6ceca9e3ab32309771d (patch)
tree	98597b21f73972eceed28b9f45b084e38c29a94e /marginal.tex
parent	f4a853cf1f6d8de4d455e80e42d30913f664c7cc (diff)
download	marginal-b43cb92da15447905f61a6ceca9e3ab32309771d.tar.gz marginal-b43cb92da15447905f61a6ceca9e3ab32309771d.tar.bz2 marginal-b43cb92da15447905f61a6ceca9e3ab32309771d.zip