diff options
Diffstat (limited to 'marginal.tex')
-rw-r--r-- | marginal.tex | 224 |
1 files changed, 134 insertions, 90 deletions
diff --git a/marginal.tex b/marginal.tex index 6451147..3a94ca0 100644 --- a/marginal.tex +++ b/marginal.tex @@ -8,11 +8,11 @@ \usepackage[dvipsnames]{xcolor} \usepackage[ colorlinks=true, - urlcolor=Black, - citecolor=Black, - filecolor=Black, - linkcolor=Black -]{hyperref} % ref and cite links with pretty colors + urlcolor=BlueViolet, + citecolor=BlueViolet, + filecolor=BlueViolet, + linkcolor=BlueViolet +]{hyperref} \begin{document} @@ -47,58 +47,72 @@ \section{Introduction} Systems with rugged landscapes are important across many disciplines, from the -physics to glasses and spin-glasses to the statistical inference problems. The +physics to glasses and spin-glasses to statistical inference problems. The behavior of these systems is best understood when equilibrium or optimal -solutions are studied and averages can be taken statically over all possible -configurations. However, such systems are also infamous for their tendency to -defy equilibrium and optimal expectations in practice, due to the presence of -dynamic transitions or crossovers that leave physical or algorithmic dynamics -stuck exploring only a subset of configurations. +solutions are studied and weighted averages can be taken statically over all +possible configurations. However, such systems are also infamous for their +tendency to defy equilibrium and optimal expectations in practice, due to the +presence of dynamic transitions or crossovers that leave physical or +algorithmic dynamics stuck exploring only a subset of configurations. -In some simple models of such landscapes, it was recently found that marginal +In some simple models of rugged landscapes, it was recently found that marginal minima are significant as the attractors of gradient descent dynamics \cite{Folena_2020_Rethinking, Folena_2023_On}. This extends to more novel -algorithms, like message passing \cite{}. -While it is still not known how to predict which marginal minima will be -attractors, this ubiquity of behavior suggests that cartography of marginal -minima is a useful step in bounding out-of-equilibrium dynamical behavior. - -In the traditional methods for analyzing the geometric structure of rugged -landscapes, it is not necessarily straightforward to condition an analysis on -the marginality of minima. Using the method of a Legendre transformation of the -Parisi parameter corresponding to a set of real replicas, one can force the -result to be marginal by restricting the value of that parameter, but this +algorithms, like approximate message passing +\cite{Kent-Dobias_2024_Algorithm-independent}. While it is still not known how +to predict which marginal minima will be attractors, this ubiquity of behavior +suggests that understanding the distribution of marginal minima is a useful +step in bounding out-of-equilibrium dynamical behavior. + +It is not straightforward to condition on the marginality of minima using the +traditional methods for analyzing the distribution of minima in rugged +landscapes. Using the method of a Legendre transformation of the Parisi +parameter corresponding to a set of real replicas, one can force the result to +correspond with marginal minima by tuning the value of that parameter, but this results in only the marginal minima at the energy level at which they are the -majority of stationary points \cite{Monasson_1995_Structural}. It is now -understood that out-of-equilibrium dynamics usually goes to marginal minima at -other energy levels \cite{Folena_2023_On}. - -The alternative, used to great success in the spherical models, is to start by -making a detailing understanding of the Hessian matrix at stationary points. -Then, one can condition the analysis on whatever properties of the Hessian are -necessary to lead to marginal minima. This strategy is so successful in the -spherical models because it is very straightforward to implement: a natural -parameter in the analysis of these models linearly shifts the spectrum of the -Hessian, and so fixing this parameter by whatever means naturally allows one to -require that the Hessian spectrum have a pseudogap. -Unfortunately this strategy is less straightforward to generalize. Many models -of interest, especially in inference problems, have Hessian statistics that are -poorly understood. +majority of stationary points, commonly known as the threshold energy +\cite{Monasson_1995_Structural}. It is now understood that out-of-equilibrium +dynamics typically goes to marginal minima at other energy levels +\cite{Folena_2023_On}. + +The alternative approach, used to great success in the spherical models, is to +start by making a detailed understanding of the Hessian matrix at stationary +points. Then, one can condition the analysis on whatever properties of the +Hessian are necessary to lead to marginal minima. This strategy is so +successful in the spherical models because it is straightforward to implement. +First, the shape of the Hessian's spectrum is independent of energy and even +whether one sits at a stationary point or not. This is a property of models +whose energy is a Gaussian random variable \cite{Bray_2007_Statistics}. +Furthermore, a natural parameter in the analysis of these models linearly +shifts the spectrum of the Hessian. Therefore, tuning this parameter to a +specific constant value allows one to require that the Hessian spectrum have a +pseudogap, and therefore that the associated minima be marginal. Unfortunately +this strategy is less straightforward to generalize. Many models of interest, +especially in inference problems, have Hessian statistics that are poorly +understood. This is especially true for the statistics of the Hessian +conditioned to lie at stationary points, which is necessary to understand in +models whose energy is non-Gaussian. Here, we introduce a generic method for conditioning the statistics of stationary points on their marginality. The technique makes use of a novel way -to condition an integral over parameters to select only those that result in a -certain value of the smallest eigenvalue of a matrix that is a function of -those parameters. By requiring that the smallest eigenvalue of the Hessian at -stationary points be zero, we restrict to marginal minima, either those with a -pseudogap in their bulk spectrum or those with outlying eigenvectors. We -provide a heuristic to distinguish these two cases. We demonstrate the method -on the spherical models, where it is unnecessary but instructive, and on -extensions of the spherical models with non-GOE Hessians where the technique is -more useful. +to condition an integration measure to select only configurations that result +in a certain value of the smallest eigenvalue of a matrix. By requiring that +the smallest eigenvalue of the Hessian at stationary points be zero, and +further looking for a sign that the zero eigenvalue lies at the edge of a +continuous spectrum, we enforce the condition that the spectrum has a +pseudogap, and is therefore marginal. We demonstrate the method on the +spherical models, where it is unnecessary but instructive, and on extensions of +the spherical models with non-GOE Hessians where the technique is more useful. \section{Conditioning on the smallest eigenvalue} +In this section, we introduce a general method for conditioning a measure on +the smallest eigenvalue of some matrix that depends on it. In Section +\ref{sec:shifted.GOE} we show how this works in perhaps the simplest example of +GOE random matrices with a shifted diagonal. In the final subsection we +describe how to extend this method to condition on the presence of a pseudogap +at the bottom on the spectrum. + \subsection{The general method} An arbitrary function $g$ of the minimum eigenvalue of a matrix $A$ can be @@ -377,7 +391,20 @@ that $\hat\lambda=0$. \section{Marginal complexity in random landscapes} +The methods of the previous section can be used in diverse settings. However, +we are interested in applying them to study stationary points in random +landscapes whose Hessian spectrum has a pseudogap -- that is, that are +marginal. In Section \ref{sec:marginal.kac-rice} we define the marginal +complexity using the tools of the previous section. In Section +\ref{sec:general.features} we review several general features in a physicists' +approach to computing the marginal complexity. In Section +\ref{sec:superspace_kac-rice} we introduce a representation of the marginal +complexity in terms of an integral over a superspace, which condenses the +notation and the resulting calculation and which we will use in one of our +examples in the next section. + \subsection{Marginal complexity from Kac--Rice} +\label{sec:marginal.kac-rice} The situation in the study of random landscapes is often as follows: an ensemble of smooth functions $H:\mathbb R^N\to\mathbb R$ define random @@ -444,7 +471,7 @@ define the complexity of points with a specific energy, stability, and minimum e \end{equation} In practice, this can be computed by introducing replicas to treat the logarithm ($\log x=\lim_{n\to0}\frac\partial{\partial n}x^n$) and replicating -again to treat each of the normalizations in the numerator. This leads to the expression +again to treat each of the normalizations in the numerator ($x^{-1}=\lim_{m\to-1}x^m$). This leads to the expression \begin{equation} \label{eq:min.complexity.expanded} \begin{aligned} \Sigma_{\lambda^*}(E,\mu) @@ -471,6 +498,7 @@ Finally, the marginal complexity is defined by evaluating the complexity conditi \end{equation} \subsection{General features of saddle point computation} +\label{sec:general.features} Several elements of the computation of the marginal complexity, and indeed the ordinary dominant complexity, follow from the formulae of the above section in @@ -536,6 +564,7 @@ that the effect of fixing the trace is independent of the disorder and only depends on properties of the constraint manifold. \subsection{Superspace representation} +\label{sec:superspace_kac-rice} The ordinary Kac--Rice calculation involves many moving parts, and this method for incorporating marginality adds even more. It is therefore convenient to @@ -603,7 +632,17 @@ of the calculation, terms involving the superspace must be expanded. \section{Examples} +In this section we present analysis of marginal complexity in three random +landscapes. In Section \ref{sec:ex.spherical} we apply the methods described +above to the spherical spin glasses, which reveals some general aspects of the +calculation. Since the spherical spin glasses are Gaussian and have identical +GOE spectra at each stationary point, the approach introduced here is overkill. +In Section \ref{sec:multispherical} we apply the methods to a multispherical +spin glass, which is still Gaussian but has a non-GOE spectrum that can vary +between stationary points. Finally, in Section \ref{sec:least.squares} we analyze a model of the sum of squares of random functions, which is non-Gaussian and whose Hessian statistics depend on the conditioning of the energy and gradient. + \subsection{Spherical spin glasses} +\label{sec:ex.spherical} The spherical spin glasses are a family of models that encompass every isotropic Gaussian field on the hypersphere defined by all $\mathbf x\in\mathbb R^N$ such that $0=\mathbf x^T\mathbf x-N$. One can consider the models as defined by centered Gaussian functions $H$ such that the covariance between two points in the configuration space is @@ -1149,56 +1188,38 @@ systems is well-established. Here, we introduce a novel extension of the traditional approach to incorporate the marginality condition. Consider supervectors in the $\mathbb R^{N|4}$ superspace of the form \begin{equation} - \pmb\phi_{a\alpha}(1,2) + \pmb\phi_a^\alpha(1,2) =\mathbf x_a +\bar\theta_1\pmb\eta_a+\bar{\pmb\eta}_a\theta_1 +i\hat{\mathbf x}_a\bar\theta_1\theta_1 - +\mathbf s_{a\alpha}(\bar\theta_1\theta_2+\bar\theta_2\theta_1) + +\mathbf s_a^\alpha(\bar\theta_1\theta_2+\bar\theta_2\theta_1) \end{equation} The traditional complexity problem, outlined in the appendix \ref{sec:dominant.complexity}, involves a supervector without the last term. \begin{widetext} - The replicated number of stationary points conditioned on energy $E$, trace $\mu$, and minimum eigenvalue $\lambda^*$ is then given by + Using the notation of Section \ref{sec:superspace_kac-rice}, the replicated + number of stationary points conditioned on energy $E$, trace $\mu$, and + minimum eigenvalue $\lambda^*$ is then given by \begin{equation} \begin{aligned} \mathcal N(E,\mu,\lambda^*)^n - &=\int\prod_{a=1}^n\lim_{m_a\to0}\prod_{\alpha=1}^{m_a}d\pmb\phi_{a\alpha} + &=\int\prod_{a=1}^n\lim_{m_a\to0}\prod_{\alpha=1}^{m_a}d\pmb\phi_a^\alpha \exp\left\{ - \delta_{\alpha1}N(\hat\beta_aE+\hat\lambda_a\lambda^*) - +\int d1\,d2\,B_{a\alpha}(1,2)\left[H(\pmb\phi_{a\alpha})+\frac12\mu(\|\pmb\phi_{a\alpha}\|^2-N)\right] + \delta^{\alpha1}N(\hat\beta_aE+\hat\lambda_a\lambda^*) + -\frac12\int d1\,d2\,B_a^\alpha(1,2)\left[\sum_{k=1}^MV_k(\pmb\phi_a^\alpha)^2 + -\mu(\|\pmb\phi_a^\alpha\|^2-N)\right] \right\} \end{aligned} \end{equation} -where we use the compact notation $d1=d\theta_1\,d\bar\theta_1$ for the -measures associated with the Grassmann directions. Here we have also defined -\begin{equation} - B_{a\alpha}(1,2)=\delta_{\alpha1}\bar\theta_2\theta_2 - (1-\hat\beta_a\bar\theta_1\theta_1) - -\delta_{\alpha1}\hat\lambda_a-\beta -\end{equation} -which encodes various aspects of the complexity problem, and the measure -\begin{align} - d\pmb\phi_{a\alpha} - =d\mathbf x_a\,\delta(\|\mathbf x_a\|^2-N)\,\frac{d\hat{\mathbf x}_a}{(2\pi)^N}\,d\pmb\eta_a\,d\bar{\pmb\eta}_a\, - d\mathbf s_{a\alpha}\,\delta(\|\mathbf s_{a\alpha}\|^2-N)\, - \delta(\mathbf x_a^T\mathbf s_{a\alpha}) -\end{align} -encoding the measures of all the superfield's constituent variables. Expanding -functions of the superfield in the coordinates $\theta$ and performing the -integrals, this expression is equivalent to that of the replicated Kac--Rice -integrand \eqref{eq:min.complexity.expanded} with the substitutions of the -Dirac $\delta$ functions of \eqref{eq:delta.grad}, \eqref{eq:delta.energy}, and -\eqref{eq:delta.eigen}. - The first step to evaluate this expression is to linearize the dependence on the random functions $V$. This is accomplished by inserting into the integral a Dirac $\delta$ function fixing the value of the energy for each replica, or \begin{equation} \delta\big( - V^k(\pmb\phi_{a\alpha}(1,2))-v_{a\alpha}^k(1,2) + V_k(\pmb\phi_a^\alpha(1,2))-v_{ka}^\alpha(1,2) \big) = - \int\prod_{a\alpha k}d\hat v_{a\alpha}^k\exp\left[ - i\int d1\,d2\,\hat v_{a\alpha}^k(1,2) - \big(V^k(\pmb\phi_{a\alpha}(1,2))-v_{a\alpha}^k(1,2)\big) + \int d\hat v_{ka}^\alpha\exp\left[ + i\int d1\,d2\,\hat v_{ka}^\alpha(1,2) + \big(V_k(\pmb\phi_a^\alpha(1,2))-v_{ka}^\alpha(1,2)\big) \right] \end{equation} where we have introduced auxiliary fields $\hat v$. With this inserted into the @@ -1208,26 +1229,26 @@ the Fourier representation of the Dirac $\delta$ function. This term is linear i \begin{equation} \overline{ \exp\left[ - i\sum_{a\alpha k}\int d1\,d2\,\hat v_{a\alpha}^k(1,2) - V^k(\pmb\phi_{a\alpha}(1,2)) + i\sum_{ka\alpha}\int d1\,d2\,\hat v_{ka}^\alpha(1,2) + V_k(\pmb\phi_a^\alpha(1,2)) \right] } = -\frac N2\sum_{ab}^n\sum_{\alpha\gamma}^{m_a}\sum_k^{\alpha N}\int d1\,d2\,d3\,d4\, - \hat v_{a\alpha}^k(1,2)f\big(\pmb\phi_{a\alpha}(1,2)^T\pmb\phi_{b\gamma}(3,4)\big)\hat v_{b\gamma}^k(3,4) + \hat v_{ka}^\alpha(1,2)f\big(\pmb\phi_a^\alpha(1,2)^T\pmb\phi_b^\gamma(3,4)\big)\hat v_{kb}^\gamma(3,4) \end{equation} The entire integrand is now quadratic in the $v$ and $\hat v$ with the kernel \begin{equation} \begin{bmatrix} - B_{a\alpha}(1,2)\delta(1,3)\delta(2,4)\delta_{ab}\delta_{\alpha\gamma} & i\delta(1,3)\,\delta(2,4) \delta_{ab}\delta_{\alpha\gamma}\\ - i\delta(1,3)\,\delta(2,4) \delta_{ab}\delta_{\alpha\gamma}& f\big(\pmb\phi_{a\alpha}(1,2)^T\pmb\phi_{b\gamma}(3,4)\big) + B_a^\alpha(1,2)\delta(1,3)\delta(2,4)\delta_{ab}\delta^{\alpha\gamma} & i\delta(1,3)\,\delta(2,4) \delta_{ab}\delta^{\alpha\gamma}\\ + i\delta(1,3)\,\delta(2,4) \delta_{ab}\delta^{\alpha\gamma}& f\big(\pmb\phi_a^\alpha(1,2)^T\pmb\phi_b^\gamma(3,4)\big) \end{bmatrix} \end{equation} The integration over the $v$ and $\hat v$ results in a term in the effective action of the form \begin{equation} -\frac M2\log\operatorname{sdet}\left( - \delta(1,3)\,\delta(2,4) \delta_{ab}\delta_{\alpha\gamma} - +B_{a\alpha}(1,2)f\big(\pmb\phi_{a\alpha}(1,2)^T\pmb\phi_{b\gamma}(3,4)\big) + \delta(1,3)\,\delta(2,4) \delta_{ab}\delta^{\alpha\gamma} + +B_a^\alpha(1,2)f\big(\pmb\phi_a^\alpha(1,2)^T\pmb\phi_b^\gamma(3,4)\big) \right) \end{equation} When expanded, this supermatrix is constructed of the scalar products of the @@ -1240,10 +1261,9 @@ Up to this point, the expressions above are general and independent of a given ansatz. However, we expect that the order parameters $X$ and $\hat X$ are zero, since this case is isotropic. Applying this ansatz here avoids a dramatically more complicated expression for the effective action found in the case with -arbitrary $X$ and $\hat X$. We also will apply the ansatz that $Q_{a\alpha -b\gamma}$ is zero for $a\neq b$, which is equivalent to assuming that the soft +arbitrary $X$ and $\hat X$. We also will apply the ansatz that $Q_{ab}^{\alpha\gamma}$ is zero for $a\neq b$, which is equivalent to assuming that the soft directions of typical pairs of stationary points are uncorrelated, and further -that $Q_{\alpha\gamma}=Q_{a\alpha a\gamma}$ independently of the index $a$, +that $Q^{\alpha\gamma}=Q_{aa}^{\alpha\gamma}$ independently of the index $a$, implying that correlations in the tangent space of typical stationary points are the same. @@ -1252,7 +1272,7 @@ Given these simplifying forms of the ansatz, taking the superdeterminant yields \begin{aligned} \log\det\left\{ \left[ - f'(C)\odot D-\hat\beta I+\left(R^{\circ2}-G^{\circ2}+I\sum_{\alpha\gamma}2(\delta_{\alpha1}\hat\lambda+\beta)(\delta_{\gamma1}\hat\lambda+\beta)Q_{\alpha\gamma}^2\right)\odot f''(C) + f'(C)\odot D-\hat\beta I+\left(R^{\circ2}-G^{\circ2}+I\sum_{\alpha\gamma}2(\delta^{\alpha1}\hat\lambda+\beta)(\delta^{\gamma1}\hat\lambda+\beta)Q_{\alpha\gamma}^2\right)\odot f''(C) \right]f(C) +(I-R\odot f'(C))^2 \right\} \\ @@ -1335,7 +1355,31 @@ taking the zero-temperature limit, we find } \end{figure} -\section{Conclusion} +\section{Conclusions} + +We have introduced a method for conditioning complexity on the marginality of +stationary points. This method is in principal completely general, and permits +this conditioning without first needing to understand the entire Hessian +statistics. We used our approach to study the marginal complexity in three +different models of random landscapes, showing that the method works and can be +applied to models whose marginal complexity was not previously known. In our +companion paper, we further show that the marginal complexity in the third +model of random nonlinear least squares can be used to effectively bound +algorithmic performance \cite{Kent-Dobias_2024_Algorithm-independent}. + +There are some limitations to the approach we have largely relied in this +paper. The main limitation is our restriction to signalless landscapes, where +there is no symmetry-breaking favored direction. This allowed us to neglect the +presence of stationary points with isolated eigenvalues as atypical, and +therefore apply the marginal conditioning using a variational principle. +However, most models of interest in inference have a nonzero signal strength +and therefore often have typical stationary points with an isolated eigenvalue. +As we described earlier, marginal complexity can still be analyzed in these +systems by tuning the shift $\mu$ until the large-deviation principle breaks +down and an imaginary part of the complexity appears. However, this is an +inconvenient measure. It's possible that a variational approach can be +preserved by treating the direction toward and the directions orthogonal to the +signal differently. This problem merits further research. \begin{acknowledgements} JK-D is supported by a \textsc{DynSysMath} Specific Initiative of the INFN. |