diff options
-rw-r--r-- | marginal.bib | 14 | ||||
-rw-r--r-- | marginal.tex | 419 |
2 files changed, 222 insertions, 211 deletions
diff --git a/marginal.bib b/marginal.bib index f25ff6d..1643aa9 100644 --- a/marginal.bib +++ b/marginal.bib @@ -108,6 +108,20 @@ primaryclass = {cond-mat.dis-nn} } +@article{Crisanti_1993_The, + author = {Crisanti, A. and Horner, H. and Sommers, H.-J.}, + title = {The spherical $p$-spin interaction spin-glass model}, + journal = {Zeitschrift für Physik B Condensed Matter}, + publisher = {Springer Science and Business Media LLC}, + year = {1993}, + month = {6}, + number = {2}, + volume = {92}, + pages = {257--271}, + url = {https://doi.org/10.1007%2Fbf01312184}, + doi = {10.1007/bf01312184} +} + @article{Dean_2006_Large, author = {Dean, David S. and Majumdar, Satya N.}, title = {Large Deviations of Extreme Eigenvalues of Random Matrices}, diff --git a/marginal.tex b/marginal.tex index 3eeed6b..1fdaee4 100644 --- a/marginal.tex +++ b/marginal.tex @@ -722,21 +722,20 @@ We will make use of this representation to simplify the analysis of the marginal \label{sec:examples} In this section we present analysis of marginal complexity in three random -landscapes. In Section \ref{sec:ex.spherical} we apply the methods described -above to the spherical spin glasses, which reveals some general aspects of the +landscapes. In Section \ref{sec:ex.spherical} we treat the spherical spin glasses, which reveals some general aspects of the calculation. Since the spherical spin glasses are Gaussian and have identical GOE spectra at each stationary point, the approach introduced here is overkill. In Section \ref{sec:multispherical} we apply the methods to a multispherical -spin glass, which is still Gaussian but has a non-GOE spectrum that can vary -between stationary points. Finally, in Section \ref{sec:least.squares} we analyze a model of the sum of squares of random functions, which is non-Gaussian and whose Hessian statistics depend on the conditioning of the energy and gradient. +spin glass, which is still Gaussian but has a non-GOE spectrum whose shape can vary +between stationary points. Finally, in Section \ref{sec:least.squares} we analyze a model of sums of squared random functions, which is non-Gaussian and whose Hessian statistics depend on the conditioning of the energy and gradient. \subsection{Spherical spin glasses} \label{sec:ex.spherical} The spherical spin glasses are a family of models that encompass every -isotropic Gaussian field on the hypersphere defined by all $\mathbf x\in\mathbb +isotropic Gaussian field on the hypersphere. Their configuration space is the sphere $S^{N-1}$ defined by all $\mathbf x\in\mathbb R^N$ such that $0=g(\mathbf x)=\frac12(\|\mathbf x\|^2-N)$. One can consider the models as -defined by centered Gaussian functions $H$ such that the covariance between two +defined by ensembles of centered Gaussian functions $H$ such that the covariance between two points in the configuration space is \begin{equation} \overline{H(\mathbf x)H(\mathbf x')}=Nf\left(\frac{\mathbf x\cdot\mathbf x'}N\right) @@ -748,7 +747,11 @@ for some function $f$ with positive series coefficients. Such functions can be c \sum_{i_1\cdots i_p}^NJ_{i_1\cdots i_p}x_{i_1}\cdots x_{i_p} \end{equation} and the elements of the tensors $J$ being independently distributed with the -unit normal distribution. +unit normal distribution \cite{Crisanti_1993_The}. We focus on marginal minima +in models with $f'(0)=0$, which corresponds to models without a random external +field. Such a random field would correspond in each individual sample $H$ to a +signal, and therefore complicate the analysis by correlating the positions of +stationary points and the eigenvectors of their Hessians. The marginal optima of these models can be studied without the methods introduced in this paper, and have been in the past \cite{Folena_2020_Rethinking, @@ -759,9 +762,9 @@ mostly independently from the problem of counting stationary points. Second, in these models the Hessian at every point in the landscape belongs to the GOE class with the same width of the spectrum $\mu_\mathrm m=2\sqrt{f''(1)}$. Therefore, all marginal minima in these systems have the same constant shift -$\mu=\mu_\mathrm m$. Despite the fact the complexity of marginal optima is +$\mu=\mu_\mathrm m$. Despite the fact that the complexity of marginal optima is well known by simpler methods, it is instructive to carry through the -calculation for this case, since we will learn something about its application in +calculation for this case, since we will learn some things about its application in more nontrivial settings. The procedure to treat the complexity of the spherical models has been made in @@ -778,7 +781,7 @@ Once these substitutions have been made, the entire expression \eqref{eq:min.complexity.expanded} is an exponential integral whose argument is a linear functional of $H$. This allows for the average to be taken over the disorder. If we gather all the $H$-dependant pieces associated with replica $a$ -into the linear functional $\mathcal O_a$ then the average gives +into the linear functional $\mathcal O_a$ then the average over the ensemble of functions $H$ gives \begin{equation} \begin{aligned} \overline{ @@ -788,33 +791,35 @@ into the linear functional $\mathcal O_a$ then the average gives &=e^{N\frac12\sum_a^n\sum_b^n\mathcal O_a\mathcal O_bf\big(\frac{\mathbf x_a\cdot\mathbf x_b}N\big)} \end{aligned} \end{equation} -The result is an integrand that only depends on the many vector variables we -have introduced through their scalar products with each other. We therefore make a change of variables in the integration from those vectors to matrices that encode their possible scalar products. These matrices are -\begin{equation} \label{eq:order.parameters} - \begin{aligned} +The result is an integrand that depends on the many vector variables we +have introduced only through their scalar products with each other. We therefore make a change of variables in the integration from those vectors to matrices that encode their possible scalar products. These matrices are +\begin{align} &C_{ab}=\frac1N\mathbf x_a\cdot\mathbf x_b - \qquad\qquad - &R_{ab}=-i\frac1N\mathbf x_a\cdot\hat{\mathbf x}_b& - \\ - &D_{ab}=\frac1N\hat{\mathbf x}_a\cdot\hat{\mathbf x}_b - &G_{ab}=\frac1N\bar{\pmb\eta}_a^T\pmb\eta_b& - \\ - &Q_{ab}^{\alpha\gamma}=\frac1N\mathbf s_a^\alpha\cdot\mathbf s_b^\gamma - &X^\alpha_{ab}=\frac1N\mathbf x_a\cdot\mathbf s_b^\alpha& - \\ - &\hat X^\alpha_{ab}=-i\frac1N\hat{\mathbf x}_a\cdot\mathbf s_b^\alpha&& - \end{aligned} -\end{equation} + \quad + &R_{ab}=-i\frac1N\mathbf x_a\cdot\hat{\mathbf x}_b + \quad + &D_{ab}=\frac1N\hat{\mathbf x}_a\cdot\hat{\mathbf x}_b& \notag \\ + &Q_{ab}^{\alpha\gamma}=\frac1N\mathbf s_a^\alpha\cdot\mathbf s_b^\gamma + \quad + &\hat X^\alpha_{ab}=-i\frac1N\hat{\mathbf x}_a\cdot\mathbf s_b^\alpha + \quad + &X^\alpha_{ab}=\frac1N\mathbf x_a\cdot\mathbf s_b^\alpha& + \notag \\ + &G_{ab}=\frac1N\bar{\pmb\eta}_a\cdot\pmb\eta_b + \label{eq:order.parameters} +\end{align} Order parameters that mix the normal and Grassmann variables generically vanish in these settings and we don't consider them here \cite{Kurchan_1992_Supersymmetry}. This transformation changes the measure of the integral, with \begin{equation} \begin{aligned} - &\prod_{a=1}^nd\mathbf x_a\,\frac{d\hat{\mathbf x}_a}{(2\pi)^N}\,d\bar{\pmb\eta}_a\,d\pmb\eta\,\prod_{\alpha=1}^{m_a}d\mathbf s_a^\alpha \\ + &\prod_{a=1}^nd\mathbf x_a\,\frac{d\hat{\mathbf x}_a}{(2\pi)^N}\, + d\bar{\pmb\eta}_a\,d\pmb\eta_a\,\prod_{\alpha=1}^{m_a}d\mathbf s_a^\alpha \\ &\quad=dC\,dR\,dD\,dG\,dQ\,dX\,d\hat X\,(\det J)^{N/2}(\det G)^{-N} \end{aligned} \end{equation} -where $J$ is the Jacobian of the transformation in the real-valued fields and takes the form +where $J$ is the Jacobian of the transformation in the real-valued fields. This +Jacobian takes a block form \begin{equation} \label{eq:coordinate.jacobian} J=\begin{bmatrix} C&iR&X_1&\cdots&X_n \\ @@ -824,14 +829,14 @@ where $J$ is the Jacobian of the transformation in the real-valued fields and ta X_n^T&i\hat X_n^T&Q_{n1}&\cdots&Q_{nn} \end{bmatrix} \end{equation} -and the contribution of the Grassmann integrals produces its own inverted -Jacobian. The block matrices indicated above are such that $Q_{ab}$ is an -$m_a\times m_b$ matrix indexed by the upper indices, while $X_a$ is an $n\times +The Grassmann integrals produces their own inverted +Jacobian. The matrix that make up the blocks of the matrix $J$ are such that $C$, $R$, and $D$ are $n\times n$ matrices indexed by their lower indices, $Q_{ab}$ is an +$m_a\times m_b$ matrix indexed by its upper indices, while $X_a$ is an $n\times m_a$ matrix with one lower and one upper index. -After these steps, which follow identically to those more carefully outlined in -the cited papers \cite{Folena_2020_Rethinking, Kent-Dobias_2023_How}, we arrive -at a form for the complexity of +These steps follow identically to those more carefully outlined in +the cited papers \cite{Folena_2020_Rethinking, Kent-Dobias_2023_How}. Following them in the present case, we arrive +at a form for the complexity of stationary points with fixed energy $E$, stability $\mu$, and lowest eigenvalue $\lambda^*$ with \begin{widetext} \begin{equation} \label{eq:spherical.complexity} \begin{aligned} @@ -868,21 +873,20 @@ at a form for the complexity of The exponential integrand is split into two effective actions coupled only by a residual determinant. The first of these actions is the usual effective action for the complexity of the spherical spin glasses, or -\begin{equation} +\begin{equation} \label{eq:spherical.action} \begin{aligned} &\mathcal S_\mathrm{SSG}(\hat\beta,C,R,D,G\mid E,\mu) - =\hat\beta E-(r_d+g_d)\mu \\ - &+\lim_{n\to0}\frac1n\left\{\frac12\sum_{ab}\left( + =\hat\beta E+\lim_{n\to0}\frac1n\bigg\{-\mu\operatorname{Tr}(R+G) \\ + &\qquad+\frac12\sum_{ab}\left( \hat\beta^2f(C_{ab}) +\big(2\hat\beta R_{ab}-D_{ab}\big)f'(C_{ab}) +(R_{ab}^2-G_{ab}^2)f''(C_{ab}) \right) +\frac12\log\det\begin{bmatrix}C&iR\\iR^T&D\end{bmatrix} - -\log\det G\right\} + -\log\det G\bigg\} \end{aligned} \end{equation} -where $r_d$ and $g_d$ are the diagonal elements of $R$ and $G$, respectively. -The second of these actions is analogous to \eqref{eq:goe.action} and contains +The second of these actions is analogous to the effective action \eqref{eq:goe.action} from the GOE example of Section~\ref{sec:shifted.GOE} and contains the contributions from the marginal pieces of the calculation, and is given by \begin{equation} \begin{aligned} @@ -896,8 +900,8 @@ the contributions from the marginal pieces of the calculation, and is given by \bigg) +2\sum_{ab}^nf''(C_{ab}) \\ - &\qquad\times\Bigg[\beta\sum_\alpha^{m_a}\left( - \sum_\gamma^{m_b}(Q_{ab}^{\alpha\gamma})^2 + &\quad\times\Bigg[\beta\sum_\alpha^{m_a}\left( + \beta\sum_\gamma^{m_b}(Q_{ab}^{\alpha\gamma})^2 -\hat\beta(X_{ab}^\alpha)^2 -2X_{ab}^\alpha\hat X_{ab}^\alpha \right) @@ -920,10 +924,9 @@ spin glass. In Section \ref{sec:least.squares} we will study a model whose energy is not Gaussian and where such a decomposition is impossible. There are some dramatic simplifications that emerge from the structure of this -particular problem. First, notice that (outside of the `volume' term due to -$J$) the dependence on the parameters $X$ and $\hat X$ are purely quadratic. +particular problem. First, notice that the dependence on the parameters $X$ and $\hat X$ are purely quadratic. Therefore, there will always be a saddle point condition where they are both -zero. In this case, we except this solution to be correct. We can reason about +zero. In this case without a fixed or random field, we except this solution to be correct. We can reason about why this is so: $X$, for instance, quantifies the correlation between the typical position of stationary points and the direction of their typical eigenvectors. In a landscape without a signal, where no direction is any more @@ -934,18 +937,18 @@ external field, the preferred direction can polarize both the direction of typical stationary points \emph{and} their soft eigenvectors. Therefore, in these instances one must account for solutions with nonzero $X$ and $\hat X$. -We further expect that $Q_{ab}=0$ for $a\neq b$. For the contrary to be true, +We similarly expect that $Q_{ab}=0$ for $a\neq b$. For the contrary to be true, eigenvectors at independently sampled stationary points would need to have their directions correlated. This is expected in situations with a signal, where such correlations would be driven by a shared directional bias towards -the signal direction. In the present situation, where there is not signal, such +the signal. In the present situation, where there is no signal, such correlations do not exist. When we take $X=\hat X=0$ and $Q^{\alpha\beta}_{ab}=\delta_{ab}Q^{\alpha\beta}$, we find that \begin{equation} \mathcal U_\mathrm{SSG}(\hat\lambda,Q,0,0\mid\beta,\lambda^*,\mu,C) - =\mathcal U_\mathrm{GOE}(\hat\lambda,Q\mid\mu,\lambda^*,\beta) + =\mathcal U_\mathrm{GOE}(\hat\lambda,Q\mid\beta,\lambda^*,\mu) \end{equation} with $\sigma^2=f''(1)$. That is, the effective action for the terms related to fixing the eigenvalue in the spherical Kac--Rice problem is exactly the same as @@ -985,21 +988,26 @@ evaluated at a fixed trace $\mu_\mathrm m$ of the Hessian. \subsection{Multispherical spin glasses} \label{sec:multispherical} -The multispherical models are a simple extension of the spherical ones, where +The multispherical spin glasses are a simple extension of the spherical ones, where the configuration space is taken to be the union of more than one hypersphere. Here we consider the specific case where the configuration space is the union -of two $(N-1)$-spheres, with $\Omega=S^{N-1}\times S^{N-1}$, and where the -energy is given by +of two $(N-1)$-spheres, with $\Omega=S^{N-1}\times S^{N-1}$. The two spheres +give rise to two constraints: for $\mathbf x=[\mathbf x^{(1)},\mathbf x^{(2)}]$ +with components $\mathbf x^{(1)},\mathbf x^{(2)}\in\mathbb R^N$, the +constraints are $0=g_1(\mathbf x)=\frac12(\|\mathbf x^{(1)}\|^2-N)$ and +$0=g_2(\mathbf x)=\frac12(\|\mathbf x^{(2)}\|^2-N)$. These two constraints are +fixed by two Lagrange multipliers $\omega_1$ and $\omega_2$. + +The energy in our multispherical spin glass is given by \begin{equation} H(\mathbf x)=H_1(\mathbf x^{(1)})+H_2(\mathbf x^{(2)})-\epsilon\mathbf x^{(1)}\cdot\mathbf x^{(2)} \end{equation} -for $\mathbf x=[\mathbf x^{(1)},\mathbf x^{(2)}]$ for components $\mathbf -x^{(1)},\mathbf x^{(2)}\in\mathbb R^N$. The two spheres give rise to two constraints $0=g_1(\mathbf x^{(1)})=\frac12(\|\mathbf x^{(1)}\|^2-N)$ and $0=g_2(\mathbf x^{(2)})=\frac12(\|\mathbf x^{(2)}\|^2-N)$, and similarly two Lagrange multipliers $\omega_1$ and $\omega_2$. Each individual sphere energy $H_s$ is -taken to be a centered Gaussian random function with a covariance given in the -usual spherical way for $\mathbf x_1,\mathbf x_2\in\mathbb R^N$ by +The energy $H_i$ of each individual sphere is taken to be a centered Gaussian +random function with a covariance given in the usual spherical spin glass way +for $\mathbf x,\mathbf x'\in\mathbb R^N$ by \begin{equation} - \overline{H_i(\mathbf x_1)H_j(\mathbf x_2)} - =N\delta_{ij}f_i\left(\frac{\mathbf x_1\cdot\mathbf x_2}N\right) + \overline{H_i(\mathbf x)H_j(\mathbf x')} + =N\delta_{ij}f_i\left(\frac{\mathbf x\cdot\mathbf x'}N\right) \end{equation} with the functions $f_1$ and $f_2$ not necessarily the same. @@ -1012,13 +1020,7 @@ use configuration spaces involving spheres of different sizes Bates_2022_Free, Huang_2023_Strong, Huang_2023_Algorithmic, Huang_2024_Optimization}. -Because the energy is Gaussian, properties of the Hessian are once again -statistically independent of those of the energy and gradient. However, unlike -the previous example of the spherical models, the spectrum of the Hessian at -different points in the configuration space has different shapes. This appears -in this problem through the presence of a configuration space defined by -multiple constraints, and therefore multiple Lagrange multipliers are necessary -to ensure they are all fixed. The resulting Lagrangian, gradient, and Hessian are +The Lagrangian to be extremized to find stationary points and its gradient and Hessian are \begin{align} &\begin{aligned} L(\mathbf x)&=H(\mathbf x) @@ -1037,15 +1039,16 @@ to ensure they are all fixed. The resulting Lagrangian, gradient, and Hessian ar &\operatorname{Hess}H(\mathbf x,\pmb\omega) \\ &\quad= \begin{bmatrix} - \partial_1\partial_1H_1(\mathbf x^{(1)})+\omega^{(1)}I&-\epsilon I \\ - -\epsilon I&\partial_2\partial_2H_2(\mathbf x^{(2)})+\omega^{(2)}I + \partial_1\partial_1H_1(\mathbf x^{(1)})+\omega_1I&-\epsilon I \\ + -\epsilon I&\partial_2\partial_2H_2(\mathbf x^{(2)})+\omega_2I \end{bmatrix} \end{aligned} \end{align} -Like in the spherical model, fixing the trace of the Hessian to $\mu$ is +where $\partial_1=\frac\partial{\partial\mathbf x^{(1)}}$ and $\partial_2=\frac\partial{\partial\mathbf x^{(2)}}$. +Like in the spherical spin glasses, fixing the trace of the Hessian to $\mu$ is equivalent to a constraint on the Lagrange multipliers. However, in this case it corresponds to $\mu=\omega_1+\omega_2$, and therefore they are not -uniquely fixed by the trace. +uniquely fixed by fixing $\mu$. Since the energy in the multispherical models is Gaussian, the properties of the matrix $\partial\partial H$ are again independent of the energy and @@ -1060,11 +1063,11 @@ spectral density of the Hessian in these models using standard methods. Because of the independence of the Hessian, the method introduced in this article is not necessary to characterize the marginal minima of this system. Rather, we could take the spectral density derived in -Appendix~\ref{sec:multispherical.spectrum} and found the Lagrange multipliers +Appendix~\ref{sec:multispherical.spectrum} and find the Lagrange multipliers $\omega_1$ and $\omega_2$ corresponding with marginality by tuning the edge of the spectrum to zero. In some ways the current method is more convenient than this, since it is a purely variational method and therefore can be reduced to a -since root-finding exercise. +single root-finding exercise. Unlike the constraints on the configurations $\mathbf x$, the constraint on the tangent vectors $\mathbf s=[\mathbf s^{(1)},\mathbf s^{(2)}]\in\mathbb R^{2N}$ @@ -1080,11 +1083,11 @@ multispherical model need not be equally spread on the two subspaces, but can be concentrated in one or the other. The calculation of the marginal complexity in this problem follows very closely -to that of the spherical spin glasses in the previous subsection, making -immediately the simplifying assumptions that the soft directions of different +to that of the spherical spin glasses in the previous subsection. We +immediately make the simplifying assumptions that the soft directions of different stationary points are typically uncorrelated and therefore $X=\hat X=0$ and the overlaps $Q$ between eigenvectors are only nonzero when in the same replica. -The result has the schematic form of \eqref{eq:spherical.complexity}, but with +The result for the complexity has the schematic form of \eqref{eq:spherical.complexity}, but with different effective actions depending now on overlaps inside each of the two spheres and between the two spheres. The effective action for the traditional complexity of the multispherical spin glass is @@ -1093,12 +1096,14 @@ complexity of the multispherical spin glass is \begin{aligned} &\mathcal S_\mathrm{MSG}(\hat\beta,C^{11},R^{11},D^{11},G^{11},C^{22},R^{22},D^{22},G^{22},C^{12},R^{12},R^{21},D^{12},G^{12},G^{21} \mid E,\omega_1,\omega_2)= \hat\beta(E-E_1-E_2-\epsilon c_d^{12})\\ - &\quad-\epsilon(r^{12}_d+r^{21}_d+g^{12}_d+g^{21}_d) + & +\mathcal S_\mathrm{SSG}(\hat\beta,C^{11},R^{11},D^{11},G^{11}\mid E_1,\omega_1) +\mathcal S_\mathrm{SSG}(\hat\beta,C^{22},R^{22},D^{22},G^{22}\mid E_2,\omega_2) + +\lim_{n\to0}\frac1n\Bigg\{ + \epsilon\operatorname{Tr}(R^{12}+R^{21}+G^{12}+G^{21}-\hat\beta C^{12}) \\ &\quad - +\lim_{n\to0}\frac1n\left\{\frac12\log\det\left( + +\frac12\log\det\left( I- \begin{bmatrix}C^{11}&iR^{11}\\iR^{11}&D^{11}\end{bmatrix}^{-1} \begin{bmatrix} @@ -1109,18 +1114,18 @@ complexity of the multispherical spin glass is C^{12} & iR^{21} \\ iR^{21} & D^{12} \end{bmatrix} \right) - -\log\det(I-(G^{11}G^{22})^{-1}G^{12}G^{21})\right\} + -\log\det(I-(G^{11}G^{22})^{-1}G^{12}G^{21})\Bigg\} \end{aligned} \end{equation} -which is the sum of two effective actions for the spherical spin glass +which is the sum of two effective actions \eqref{eq:spherical.action} for the spherical spin glass associated with each individual sphere, and some coupling terms. The order parameters are defined the same as in the spherical spin glasses, but now with raised indices to indicate whether the vectors come from one or the other spherical subspace. The effective action for the eigenvalue-dependent part of the complexity is likewise given by -\begin{equation} +\begin{equation} \label{eq:multispherical.marginal.action} \begin{aligned} - &\mathcal U_\mathrm{MSG}(\hat q,\hat\lambda,Q^{11},Q^{22},Q^{12}\mid\lambda^*,\omega_1,\omega_2,\beta) \\ + &\mathcal U_\mathrm{MSG}(\hat q,\hat\lambda,Q^{11},Q^{22},Q^{12}\mid\beta,\lambda^*,\omega_1,\omega_2) \\ &\quad=\lim_{m\to0}\bigg\{\sum_{\alpha=1}^m\left[\hat q^\alpha(Q^{11,\alpha\alpha}+Q^{22,\alpha\alpha}-1)-\beta(\omega_1Q^{11,\alpha\alpha}+\omega_2Q^{22,\alpha\alpha}-2\epsilon Q^{12,\alpha\alpha})\right] +\hat\lambda(\omega_1Q^{11,11}+\omega_2Q^{22,11}-2\epsilon Q^{12,11}) \\ &\qquad\qquad+\sum_{i=1,2}f_i''(1)\left[\beta^2\sum_{\alpha\gamma}^m(Q^{ii,\alpha\gamma})^2+2\beta\hat\lambda\sum_\alpha^m(Q^{ii,1\alpha})^2+\hat\lambda^2(Q^{ii,11})^2\right] @@ -1134,14 +1139,10 @@ the complexity is likewise given by \end{widetext} The new variables $\hat q^\alpha$ are Lagrange multipliers introduced to enforce the constraint that $Q^{11,\alpha\alpha}+Q^{22,\alpha\alpha}=1$. - -The biggest change between this problem and the -spherical one is that now the spherical constraint in the tangent space at each -stationary point gives the constraint on the order parameters -$q^{11}_d+q^{22}_d=1$. Therefore, the diagonal of the $Q$ matrices cannot be -taken to be 1 as before. To solve the marginal problem, we take each of the -matrices $Q^{11}$, $Q^{22}$, and $Q^{12}$ to have the planted replica symmetric -form \eqref{eq:Q.structure}, but with the diagonal not necessarily equal to 1, so +Because of this constraint, the diagonal of the $Q$ matrices cannot be +taken to be 1 as in Section~\ref{sec:shifted.GOE}. Instead we take each of the matrices $Q^{11}$, $Q^{22}$, and $Q^{12}$ to have +the planted replica symmetric form of \eqref{eq:Q.structure}, but with the +diagonal not necessarily equal to 1, so \begin{equation} Q^{ij}=\begin{bmatrix} \tilde q^{ij}_d & \tilde q^{ij}_0 & \tilde q^{ij}_0 & \cdots & \tilde q^{ij}_0 \\ @@ -1152,12 +1153,11 @@ form \eqref{eq:Q.structure}, but with the diagonal not necessarily equal to 1, s \end{bmatrix} \end{equation} This requires us to introduce two new order parameters per pair $(i,j)$. When -this ansatz is inserted into the expression for the effective action and the +this ansatz is inserted into the expression \eqref{eq:multispherical.marginal.action} for the effective action and the limit of $m\to0$ is taken, we find \begin{widetext} - \begin{equation} - \begin{aligned} - &\mathcal U_\mathrm{MSG}(\hat q,\hat{\tilde q},\hat\lambda,\tilde q_d^{11},\tilde q_0^{11},q_d^{11},q_0^{11},\tilde q_d^{22},\tilde q_0^{22},q_d^{22},q_0^{22},\tilde q_d^{12},\tilde q_0^{12},q_d^{12},q_0^{12}\mid\lambda^*,\omega_1,\omega_2,\beta) \\ + \begin{align} + &\mathcal U_\mathrm{MSG}(\hat q,\hat{\tilde q},\hat\lambda,\tilde q_d^{11},\tilde q_0^{11},q_d^{11},q_0^{11},\tilde q_d^{22},\tilde q_0^{22},q_d^{22},q_0^{22},\tilde q_d^{12},\tilde q_0^{12},q_d^{12},q_0^{12}\mid\beta,\lambda^*,\omega_1,\omega_2) \notag \\ &=\sum_{i=1,2}\left\{f_i''(1)\left[ \beta^2\left( (\tilde q^{ii}_d)^2 @@ -1172,7 +1172,7 @@ limit of $m\to0$ is taken, we find \right] +\hat\lambda\tilde q^{ii}_d\omega_i -\beta(\tilde q^{ii}_d-q^{ii}_d)\omega_i - \right\} \\ + \right\} \notag \\ &+\frac12\log\bigg[ \left( 2q^{12}_0\tilde q^{12}_0-\tilde q^{12}_0(\tilde q^{12}_d+q^{12}_d) @@ -1181,35 +1181,34 @@ limit of $m\to0$ is taken, we find \left( 2q^{12}_0\tilde q^{12}_0-\tilde q^{12}_0(\tilde q^{12}_d+q^{12}_d) -2q^{11}_0\tilde q^{22}_0+q^{11}_d\tilde q^{22}_0+\tilde q^{11}_0\tilde q^{22}_d - \right) \\ + \right) \notag \\ &\qquad\qquad+2\left(3(q^{12}_0)^2-(\tilde q^{12}_0)^2-2q^{12}_0q^{12}_d-3q^{11}_0q^{22}_0+q^{11}_dq^{22}_0+\tilde q^{11}_0\tilde q^{22}_0+q^{11}_0q^{22}_d \right)\left( (\tilde q^{12}_0)^2-(\tilde q^{12}_d)^2-\tilde q^{11}_0\tilde q^{22}_0+\tilde q^{11}_d\tilde q^{22}_d - \right) \\ + \right) \notag \\ &\qquad\qquad+\left( 2(q^{12}_0)^2-(\tilde q^{12}_0)^2-(q^{12}_d)^2-2q^{11}_0q^{22}_0+\tilde q^{11}_0\tilde q^{22}_0+q^{11}_dq^{22}_d \right)\left( (\tilde q^{12}_0)^2-(\tilde q^{12}_d)^2-\tilde q^{11}_0\tilde q^{22}_0+\tilde q^{11}_d\tilde q^{22}_d \right) \bigg] - \\ + \notag \\ &-\log\left[(q^{11}_d-q^{11}_0)(q^{22}_d-q^{22}_0)-(q^{12}_d-q^{12}_0)^2\right] -2\epsilon\big[\hat\lambda\tilde q^{12}_d -\beta(\tilde q^{12}_d-q^{12}_d)\big] +\hat q(q^{11}_d+q^{22}_d-1)+\hat{\tilde q}(\tilde q^{11}_d+\tilde q^{22}_d-1) - \end{aligned} - \end{equation} + \label{eq:multispherical.ansatz} + \end{align} \end{widetext} To make the limit to zero temperature, we once again need an ansatz for the asymptotic behavior of the overlaps. These take the form -$q^{ij}_0=q^{ij}_d-y^{ij}_0\beta^{-1}-z^{ij}_0\beta^{-2}$, with the same for -the tilde variables. Notice that in this case, the asymptotic behavior of the +$q^{ij}_0=q^{ij}_d-y^{ij}_0\beta^{-1}-z^{ij}_0\beta^{-2}$. Notice that in this case, the asymptotic behavior of the off-diagonal elements is to approach the value of the diagonal rather than to approach one. We also require $\tilde q^{ij}_d=q^{ij}_d-\tilde y^{ij}_d\beta^{-1}-\tilde z^{ij}_d\beta^{-2}$, i.e., that the tilde diagonal terms also approache the same diagonal value as the untilde terms, but with potentially different rates. -As before, in order for the volume term to stay finite, there are necessary +As before, in order for the logarithmic term to stay finite, there are necessary constraints on the values $y$. These are \begin{align} \frac12(y^{11}_d-\tilde y^{11}_d)=y^{11}_0-\tilde y^{11}_0 \\ @@ -1222,19 +1221,18 @@ the diagonal elements are not necessarily equal, we have a more general relationship. When the $\beta$-dependence of the $q$ variables is inserted into the effective -action and the limit taken, we find an expression that is too large to report +action \eqref{eq:multispherical.ansatz} and the limit $\beta\to\infty$ taken, we find an expression that is too large to report here. However, it can be extremized over all of the variables in the problem just as in the previous examples to find the values of the Lagrange multipliers $\omega_1$ and $\omega_2$ corresponding to marginal minima. -Fig.~\ref{fig:msg.marg}(a) shows examples of the marginal $\omega_1$ and -$\omega_2$ for a variety of couplings $\epsilon$ when the covariances of the +Fig.~\ref{fig:msg.marg}(a) shows examples of the $\omega_1$ and +$\omega_2$ corresponding to marginal spectra for a variety of couplings $\epsilon$ when the covariances of the energy on the two spherical subspaces are such that $1=f_1''(1)=f_2''(1)$. Fig.~\ref{fig:msg.marg}(b) shows the Hessian spectra associated with some specific pairs $(\omega_1,\omega_2)$. When $\epsilon=0$ and the two spheres are -uncoupled, we find the result for two independent spherical spin glasses. If +uncoupled, we find the result for two independent spherical spin glasses: if either $\omega_1=2\sqrt{f''(1)}=2$ or $\omega_2=2\sqrt{f''(1)}=2$ and the other -Lagrange multiplier is larger than 2, then we have a marginal minimum in the -uncoupled case, made up of the Cartesian product of a marginal minimum on one +Lagrange multiplier is larger than 2, then we have a marginal minimum made up of the Cartesian product of a marginal minimum on one subspace and a stable minimum on the other. \begin{figure} @@ -1276,31 +1274,32 @@ $\epsilon$ is increased, the most common type of marginal minimum drifts toward points with $\omega_1>\omega_2$. Multispherical spin glasses may be an interesting platform for testing ideas -about which among the possible marginal minima actually attract the dynamics, -and which do not. In the limit where $\epsilon=0$ and the configurations of the -two spheres are independent, the minima found should be marginal on both -sphere's energies. Just because technically on the expanded configuration space -a deep and stable minimum on one sphere and a marginal minimum on the other is +about which among the possible marginal minima can dynamics, +and cannot. In the limit where $\epsilon=0$ and the configurations of the +two spheres are independent, the minima found dynamically should be marginal on both +subspaces. Just because technically on the expanded configuration space +the Cartesian product of a deep stable minimum on one sphere and a marginal minimum on the other is a marginal minimum on the whole space doesn't mean the deep and stable minimum is any easier to find. This intuitive idea that is precise in the zero-coupling limit should continue to hold at small nonzero coupling, and perhaps reveal something about the inherent properties of marginal minima that do not tend to be found by algorithms. -\subsection{Random sums of squares} +\subsection{Sums of squared random functions} \label{sec:least.squares} In this subsection we consider perhaps the simplest example of a non-Gaussian -landscape: the problem of random nonlinear least squares optimization. Though, -for reasons we will see it is easier to make predictions for random nonlinear +landscape: the problem of sums of squared random functions. This problem has a close resemblance to nonlinear least squares optimization. Though, +for reasons we will see it is easier to make predictions for nonlinear \emph{most} squares, i.e., the problem of maximizing the sum of squared terms. -We again take a spherical configuration space with $\mathbf x\in S^{N-1}$ and $0=g(\mathbf x)=\frac12(\|\mathbf x\|^2-N)$ as in the spherical spin glasses, and consider a set +We again take a spherical configuration space with $\mathbf x\in S^{N-1}$ and $0=g(\mathbf x)=\frac12(\|\mathbf x\|^2-N)$ as in the spherical spin glasses. The energy is built from a set of $M=\alpha N$ random functions $V_k:\mathbf S^{N-1}\to\mathbb R$ that are centered Gaussians with covariance \begin{equation} \overline{V_i(\mathbf x)V_j(\mathbf x')}=\delta_{ij}f\left(\frac{\mathbf x\cdot\mathbf x'}N\right) \end{equation} -The energy is minus the sum of squares of the $V_k$, or +Each of the $V_k$ is an independent spherical spin glass. +The total energy is minus the sum of squares of the $V_k$, or \begin{equation} \label{eq:ls.hamiltonian} H(\mathbf x)=-\frac12\sum_{k=1}^MV_k(\mathbf x)^2 \end{equation} @@ -1309,35 +1308,37 @@ least-squares version of this problem were recently studied in a linear context, with $f(q)=\sigma^2+aq$ \cite{Fyodorov_2020_Counting, Fyodorov_2022_Optimization}. Some results on the ground state of the general nonlinear problem can also be found in \cite{Tublin_2022_A}, and a solution to -the equilibrium problem can be found in \cite{Urbani_2023_A}. In particular, -that work indicates that the low-lying minima of the least squares problem tend +the equilibrium problem can be found in \cite{Urbani_2023_A}. +Those works indicate that the low-lying minima of the least squares problem tend to be either replica symmetric or full replica symmetry breaking. To avoid either a trivial analysis or a very complex one, we instead focus on maximizing the sum of squares, or minimizing \eqref{eq:ls.hamiltonian}. -Fortunately, the \emph{maxima} of this problem have a more amenable structure -for study, as they are typically described by {\oldstylenums1}\textsc{rsb}-like structure. There is a -heuristic intuition for this: in the limit of $M\to1$, this problem is just the +The minima of \eqref{eq:ls.hamiltonian} have a more amenable structure +for study than the maxima, as they are typically described by a {\oldstylenums1}\textsc{rsb}-like structure. There is a +heuristic intuition for this: in the limit of $M\to1$, this problem is just minus the square of a spherical spin glass landscape. The distribution and properties of stationary points low and high in the spherical spin glass are not changed, -except that their energies are stretched and minima are transformed into -maxima. This is why the top of the landscape doesn't qualitatively change. The -bottom, however, consists of the zero-energy level set in the spherical spin -glass. This level set is well-connected, and so the ground states should also +except that their energies are stretched and maxima are transformed into +minima. Therefore, the bottom of the landscape doesn't qualitatively change. The +top, however, consists of the zero-energy level set in the spherical spin +glass. This level set is well-connected, and so the highest states should also be well connected and flat. -Focusing on the top of the landscape and therefore dealing with a {\oldstylenums1}\textsc{rsb}-like -problem is good for our analysis. Algorithms will tend to be stuck in the ways -they are for hard optimization problems, and we will be able to explicitly +Focusing on the bottom of the landscape and therefore dealing with a {\oldstylenums1}\textsc{rsb}-like +problem makes our analysis easier. Algorithms will tend to be stuck in the ways +they are in hard optimization problems, and we will be able to predict where. Therefore, we will study the most squares problem rather than the least squares one. We calculate the complexity of minima of \eqref{eq:ls.hamiltonian} in Appendix~\ref{sec:dominant.complexity}, which corresponds to maximizing the sum of squares, under a replica symmetric ansatz (which covers {\oldstylenums1}\textsc{rsb}-like problems) for arbitrary covariance $f$, and we -calculate the complexity of marginal minima in this section.. +calculate the complexity of marginal minima in this section. -Applying the Lagrange multiplier method detailed above to enforce the spherical constraint, the gradient and Hessian are +As in the previous sections, we used the method of Lagrange multipliers to analyse stationary points on the constrained configuration space. The Lagrangian and its associated gradient and Hessian are \begin{align} + &L(\mathbf x,\omega) + =-\frac12\bigg(\sum_k^MV_k(\mathbf x)^2-\omega\big(\|\mathbf x\|^2-N\big)\bigg) \\ &\nabla H(\mathbf x,\omega) =-\sum_k^MV_k(\mathbf x)\partial V_k(\mathbf x)+\omega\mathbf x \\ @@ -1347,10 +1348,9 @@ Applying the Lagrange multiplier method detailed above to enforce the spherical -V_k(\mathbf x)\partial\partial V_k(\mathbf x)\right]+\omega I \end{aligned} \end{align} -As in the spherical and multispherical models, fixing the trace of the Hessian -at largest order in $N$ is equivalent to constraining the value of the Lagrange -multiplier $\omega=\mu$, since the trace of the random parts of the Hessian -matrix contribute typical values at a lower order in $N$. +As in the spherical and multispherical spin glasses, fixing the trace of the Hessian +is equivalent to constraining the value of the Lagrange +multiplier $\omega=\mu$. The derivation of the marginal complexity for this model is complicated, but can be made schematically like that of the derivation of the equilibrium free @@ -1383,7 +1383,7 @@ Dirac $\delta$ function fixing the value of the energy for each replica, or \big(V_k(\pmb\phi_a^\alpha(1,2))-v_{ka}^\alpha(1,2)\big) \right] \end{equation} -where we have introduced auxiliary fields $\hat v$. With this inserted into the +where we have introduced auxiliary superfields $\hat v$. With this inserted into the integral, all other instances of $V$ are replaced by $v$, and the only remaining dependence on the disorder is from the term $\hat vV$ arising from the Fourier representation of the Dirac $\delta$ function. This term is linear @@ -1396,12 +1396,12 @@ in $V$, and therefore the random functions can be averaged over to produce \right] } = - -\frac N2\sum_{ab}^n\sum_{\alpha\gamma}^{m_a}\sum_k^M\int d1\,d2\,d3\,d4\, + -\frac12\sum_{ab}^n\sum_{\alpha\gamma}^{m_a}\sum_k^M\int d1\,d2\,d3\,d4\, \hat v_{ka}^\alpha(1,2)f\big(\pmb\phi_a^\alpha(1,2)\cdot\pmb\phi_b^\gamma(3,4)\big)\hat v_{kb}^\gamma(3,4) \end{equation} \end{widetext} The entire integrand is now factorized in the indices $k$ and quadratic in the -$v$ and $\hat v$ with the kernel +superfields $v$ and $\hat v$ with the kernel \begin{equation} \begin{bmatrix} B^\alpha(1,2)\delta(1,3)\delta(2,4)\delta_{ab}\delta^{\alpha\gamma} @@ -1410,7 +1410,7 @@ $v$ and $\hat v$ with the kernel & f\big(\pmb\phi_a^\alpha(1,2)\cdot\pmb\phi_b^\gamma(3,4)\big) \end{bmatrix} \end{equation} -The integration over the $v$ and $\hat v$ results in a term in the effective action of the form +The integration over $v$ and $\hat v$ results in a term in the effective action of the form \begin{equation} \label{eq:sdet.1} \begin{aligned} &-\frac M2\log\operatorname{sdet}\bigg[ @@ -1425,56 +1425,51 @@ scalar products of the real and Grassmann vectors that make up $\pmb\phi$. The change of variables to these order parameters again results in the Jacobian of \eqref{eq:coordinate.jacobian}, contributing \begin{equation} - \frac N2\log\det J(C,R,D,Q,X,\hat X)-\frac N2\log\det G^2 + \frac N2\log\det J-\frac N2\log\det G^2 \end{equation} to the effective action. -Up to this point, the expressions above are general and independent of a given +Up to this point, the expressions are general and independent of a given ansatz. However, we expect that the order parameters $X$ and $\hat X$ are zero, -since this case is isotropic. Applying this ansatz here avoids a dramatically -more complicated expression for the effective action found in the case with -arbitrary $X$ and $\hat X$. We also will apply the ansatz that $Q_{ab}^{\alpha\gamma}$ is zero for $a\neq b$, which is equivalent to assuming that the soft +since again we are in a setting with no signal or external field. Applying this ansatz here avoids a dramatically +more complicated expression for the effective action. We also will apply the ansatz that $Q_{ab}^{\alpha\gamma}$ is zero for $a\neq b$, which is equivalent to assuming that the soft directions of typical pairs of stationary points are uncorrelated, and further that $Q^{\alpha\gamma}=Q_{aa}^{\alpha\gamma}$ independently of the index $a$, implying that correlations in the tangent space of typical stationary points are the same. -Given these simplifying forms of the ansatz, taking the superdeterminant in +Given this ansatz, taking the superdeterminant in \eqref{eq:sdet.1} yields \begin{widetext} -\begin{equation} - \begin{aligned} - \log\det\left\{ +\begin{align} + &-\frac M2\log\det\left\{ \left[ f'(C)\odot D-\hat\beta I+\left(R^{\circ2}-G^{\circ2}+I\sum_{\alpha\gamma}2(\delta^{\alpha1}\hat\lambda+\beta)(\delta^{\gamma1}\hat\lambda+\beta)(Q^{\alpha\gamma})^2\right)\odot f''(C) \right]f(C) +(I-R\odot f'(C))^2 - \right\} \\ - +n\log\det_{\alpha\gamma}(\delta_{\alpha\gamma}-2(\delta_{\alpha1}\hat\lambda+\beta)Q^{\alpha\gamma}) - -2\log\det\big[I+G\odot f'(C)\big] - \end{aligned} -\end{equation} + \right\} \notag \\ + &\hspace{16em}-n\frac M2\log\det\big[\delta_{\alpha\gamma}-2(\delta_{\alpha1}\hat\lambda+\beta)Q^{\alpha\gamma}\big] + +M\log\det\big[I+G\odot f'(C)\big] +\end{align} \end{widetext} where once again $\odot$ is the Hadamard product and $A^{\circ n}$ gives the Hadamard power of $A$. We can already see one substantive difference between the structure of this problem and that of the spherical models: the effective action in this case mixes the order parameters $G$ due to the Grassmann variables with the -ones $C$, $R$, and $D$ due to the other variables. Notice further that the dependence on $Q$ due to the marginal constraint is likewise no longer separable. This is the realization of -the fact that the Hessian properties are no longer independent of the energy +ones $C$, $R$, and $D$ due to the other variables. Notice further that the dependence on $Q$ due to the marginal constraint is likewise no longer separable into its own term. This is the realization of +the fact that the Hessian is no longer independent of the energy and gradient. Now we have reduced the problem to an extremal one over the order parameters $\hat\beta$, $\hat\lambda$, $C$, $R$, $D$, $G$, and $Q$, it is time to make an ansatz for the form of order we expect to find. We will focus on a regime where the structure of stationary points is replica symmetric, and further where -typical pairs of stationary points have no overlap. This requires that $f(0)=0$, or that there is no constant term in the random polynomials. This gives +typical pairs of stationary points have no overlap. This requires that $f(0)=0$, or that there is no constant term in the random functions. This gives the ansatz \begin{align} C=I && R=rI && D = dI && G = gI \end{align} We further take a planted replica symmetric structure for the matrix $Q$, -identical to that in \eqref{eq:Q.structure}. The resulting effective action is -the same as if we had made an annealed calculation in the complexity, though -the previous expressions are general. We find an expression of the form +identical to that in \eqref{eq:Q.structure}. This results in \begin{equation} \begin{aligned} &\Sigma_{\lambda^*}(E,\mu) @@ -1483,7 +1478,7 @@ the previous expressions are general. We find an expression of the form &\hspace{8em}\times e^{nN\mathcal S_\mathrm{RSS}(\hat\beta,\hat\lambda,r,d,g,q_0,\tilde q_0\mid\lambda^*,E,\mu,\beta)} \end{aligned} \end{equation} -for the effective action +with an effective action \begin{widetext} \begin{equation} \begin{aligned} @@ -1492,7 +1487,7 @@ for the effective action +\hat\lambda\lambda^* +\frac12\log\left(\frac{d+r^2}{g^2} \times\frac{1-2q_0+\tilde q_0^2}{(1-q_0)^2}\right) \\ - &\quad-\frac\alpha2\log\bigg( + &\quad-\frac\alpha2\log\Bigg( \frac{1-4f'(1)\big[\beta(1-q_0)+\frac12\hat\lambda-\beta(\beta+\hat\lambda)(1-2q_0+\tilde q_0^2)f'(1)\big]} {\big[1-2(1-q_0)\beta f'(1)\big]^2} \\ &\qquad\qquad\qquad\times @@ -1501,11 +1496,11 @@ for the effective action }{ \big[1+gf'(1)\big]^2 } - \bigg) + \Bigg) \end{aligned} \end{equation} We expect as before the limits of $q_0$ and $\tilde q_0$ as $\beta$ goes to -infinity to approach one, defining their asymptotic expansion as in +infinity to approach one, defining their asymptotic expansion like in \eqref{eq:q0.limit} and \eqref{eq:q0t.limit}. Upon making this substitution and taking the zero-temperature limit, we find \begin{equation} @@ -1529,7 +1524,8 @@ taking the zero-temperature limit, we find \end{aligned} \end{equation} \end{widetext} -We can finally write the complexity with fixed minimum eigenvalue $\lambda^*$ as +We can finally write the complexity with fixed energy $E$, stability $\mu$, and +minimum eigenvalue $\lambda^*$ as \begin{equation} \label{eq:rss.complexity} \begin{aligned} &\Sigma_{\lambda^*}(E,\mu) \\ @@ -1557,7 +1553,7 @@ introduced in this paper becomes necessary. } \label{fig:ls.complexity} \end{figure} -The marginal complexity can be derived from \eqref{eq:rss.complexity} using the \eqref{eq:marginal.stability} to fix $\mu$ to the marginal stability $\mu_\textrm m(E)$ and then by evaluating the complexity at that stability as in \eqref{eq:marginal.complexity}. +The marginal complexity can be derived from \eqref{eq:rss.complexity} using the condition \eqref{eq:marginal.stability} to fix $\mu$ to the marginal stability $\mu_\textrm m(E)$ and then evaluating the complexity at that stability as in \eqref{eq:marginal.complexity}. Fig.~\ref{fig:ls.complexity} shows the marginal complexity in a sum-of-squares model with $\alpha=\frac32$ and $f(q)=q^2+q^3$. Also shown is the dominant complexity computed in Appendix~\ref{sec:dominant.complexity}. As the figure @@ -1579,26 +1575,26 @@ Fig.~\ref{fig:ls.stability} shows the associated marginal stability $\mu_\mathrm m(E)$ for the same model. Recall that the definition of the marginal stability in \eqref{eq:marginal.stability} is that which eliminates the variation of $\Sigma_{\lambda^*}(E,\mu)$ with respect to $\lambda^*$ at the -point $\lambda^*=0$. Unlike the Gaussian spherical spin glass, this varies with -energy in a nontrivial way. That figure also shows the dominant stability, -which is the stability associated with the dominant complexity, which coincides +point $\lambda^*=0$. Unlike in the Gaussian spherical spin glass, in this model $\mu_\mathrm m(E)$ varies with +energy in a nontrivial way. The figure also shows the dominant stability, +which is the stability associated with the dominant complexity and coincides with the marginal stability only at the threshold energy. -In our companion paper, we explore the relationship between the marginal -complexity and the performance of two generic algorithms on this model: +In our companion paper, we use a sum of squared random functions model to explore the relationship between the marginal +complexity and the performance of two generic algorithms: gradient descent and approximate message passing \cite{Kent-Dobias_2024_Algorithm-independent}. We show that the range of energies where the marginal complexity is positive does effectively bound the performance of these algorithms. At the moment the comparison is restricted to models with small polynomial powers appearing in $f(q)$ and with small $\alpha$ -for computational reasons. However, with some of the \textsc{dmft} results -already found on these models it should be possible to make comparisons in a +for computational reasons. However, using the \textsc{dmft} results +already found for these models it should be possible to make comparisons in a wider family of models \cite{Kamali_2023_Dynamical, Kamali_2023_Stochastic}. -The results on the marginal complexity are complimentary to rigorous results on +The results for the marginal complexity are complimentary to rigorous results on the performance of algorithms in the least squares case, which focus on bounds -for the parameters of $f$ necessary for zero-energy solutions to exist and be -found by algorithms \cite{Montanari_2023_Solving, Montanari_2024_On}. With more +for $\alpha$ and the parameters of $f$ necessary for zero-energy solutions to exist and be +found by algorithms \cite{Montanari_2023_Solving, Montanari_2024_On}. After more work to evaluate the marginal complexity in the full \textsc{rsb} case, it will be interesting to compare the bounds implied by the distribution of marginal minima with those made by other means. @@ -1607,26 +1603,24 @@ minima with those made by other means. \label{sec:conclusion} We have introduced a method for conditioning complexity on the marginality of -stationary points. This method is in principal completely general, and permits -this conditioning without first needing to understand the entire Hessian -statistics. We used our approach to study the marginal complexity in three +stationary points. This method is general, and permits +conditioning without first needing to understand the statistics of the Hessian at stationary points. We used our approach to study marginal complexity in three different models of random landscapes, showing that the method works and can be applied to models whose marginal complexity was not previously known. In our -companion paper, we further show that the marginal complexity in the third -model of random nonlinear least squares can be used to effectively bound +companion paper, we further show that marginal complexity in the third +model of sums of squared random functions can be used to effectively bound algorithmic performance \cite{Kent-Dobias_2024_Algorithm-independent}. -There are some limitations to the approach we have largely relied in this +There are some limitations to the approach we relied on in this paper. The main limitation is our restriction to signalless landscapes, where -there is no symmetry-breaking favored direction. This allowed us to neglect the -presence of stationary points with isolated eigenvalues as atypical, and -therefore apply the marginal conditioning using a variational principle. +there is no symmetry-breaking favored direction. This allowed us to treat stationary points with isolated eigenvalues as atypical, and +therefore find the marginal stability $\mu_\mathrm m$ using a variational principle. However, most models of interest in inference have a nonzero signal strength and therefore often have typical stationary points with an isolated eigenvalue. -As we described earlier, marginal complexity can still be analyzed in these +As we described, marginal complexity can still be analyzed in these systems by tuning the shift $\mu$ until the large-deviation principle breaks down and an imaginary part of the complexity appears. However, this is an -inconvenient measure. It's possible that a variational approach can be +inconvenient approach. It is possible that a variational approach can be preserved by treating the direction toward and the directions orthogonal to the signal differently. This problem merits further research. @@ -1635,8 +1629,8 @@ some dynamics and which cannot attract any dynamics looms large over this work. As we discussed briefly at the end of Section~\ref{sec:multispherical}, in some simple contexts it is easy to see why certain marginal minima are not viable, but at the moment we do not know how to generalize this. Ideas related to the -stability of minima with respect to perturbations of the landscape have -recently been suggested as a route to understanding this problem, but this is +self-similarity and stochastic stability of minima have +recently been suggested as a route to understanding this problem, but this approach is still in its infancy \cite{Urbani_2024_Statistical}. \begin{acknowledgements} @@ -1648,8 +1642,9 @@ still in its infancy \cite{Urbani_2024_Statistical}. \section{A primer on superspace} \label{sec:superspace} +In this appendix we review the algebra of superspace \cite{DeWitt_1992_Supermanifolds}. The superspace $\mathbb R^{N|2D}$ is a vector space with $N$ real indices and -$2D$ Grassmann indices $\bar\theta_1,\theta_1,\ldots,\bar\theta_D,\theta_D$ \cite{DeWitt_1992_Supermanifolds}. +$2D$ Grassmann indices $\bar\theta_1,\theta_1,\ldots,\bar\theta_D,\theta_D$. The Grassmann indices anticommute like fermions. Their integration is defined by \begin{equation} \int d\theta\,\theta=1 @@ -1660,7 +1655,9 @@ Because the Grassmann indices anticommute, their square is always zero. Therefore, any series expansion of a function with respect to a given Grassmann index will terminate exactly at linear order, while a series expansion with respect to $n$ Grassmann variables will terminate exactly at $n$th order. If -$f$ is an arbitrary function, then +$f$ is an arbitrary superspace function, then the integral of $f$ with respect +to a Grassmann index can be evaluated using this property of the series +expansion by \begin{equation} \int d\theta\,f(a+b\theta) =\int d\theta\,\left[f(a)+f'(a)b\theta\right] @@ -1704,7 +1701,7 @@ absolute value sign) that make up the basic Kac--Rice measure, so that we can wr &\qquad=\int d\pmb\phi\,e^{\int d1\,H(\pmb\phi(1))} \end{aligned} \end{equation} -where we have written $d1=d\theta_1\,d\bar\theta_1$ and $d\pmb\phi=d\mathbf +where we have written the measures $d1=d\theta_1\,d\bar\theta_1$ and $d\pmb\phi=d\mathbf x\,d\bar{\pmb\eta}\,d\pmb\eta\,\frac{d\hat{\mathbf x}}{(2\pi)^N}$. Besides some deep connections to the physics of BRST, this compact notation dramatically simplifies the analytical treatment of the problem. The energy of stationary points can also be fixed using this notation, by writing @@ -1756,7 +1753,7 @@ superdeterminant of $M$ is given by \begin{equation} \operatorname{sdet}M=\det(A-BD^{-1}C)\det(D)^{-1} \end{equation} -which is the same for the normal equation for the determinant of a block matrix +which is the same as the normal expression for the determinant of a block matrix save for the inverse of $\det D$. Likewise, the supertrace of $M$ is is given by \begin{equation} \operatorname{sTr}M=\operatorname{Tr}A-\operatorname{Tr}D @@ -1773,20 +1770,20 @@ complexity of minima \cite{Annibale_2003_The, Annibale_2003_Supersymmetric, Anni This arises from considering the Kac--Rice formula as a kind of gauge fixing procedure \cite{Zinn-Justin_2002_Quantum}. Around each stationary point consider making the coordinate transformation $\mathbf u=\nabla H(\mathbf x)$. -Then in the absence of fixing the trace, the Kac--Rice measure becomes +Then, in the absence of fixing the trace of the Hessian to $\mu$, the Kac--Rice measure becomes \begin{equation} \int d\nu(\mathbf x,\pmb\omega\mid E) =\int\sum_\sigma d\mathbf u\,\delta(\mathbf u)\, \delta\big(NE-H(\mathbf x_\sigma)\big) \end{equation} -where the sum is over stationary points. This integral has a symmetry of its +where the sum is over stationary points $\sigma$. This integral has a symmetry of its measure of the form $\mathbf u\mapsto\mathbf u+\delta\mathbf u$. Under the nonlinear transformation that connects $\mathbf u$ and $\mathbf x$, this implies a symmetry of the measure in the Kac--Rice integral of $\mathbf x\mapsto\mathbf x+(\operatorname{Hess}H)^{-1}\delta\mathbf u$. This symmetry, while exact, is nonlinear and difficult to work with. -When the absolute value sign has been dropped and Grassmann vectors introduced, +When the absolute value function has been dropped and Grassmann vectors introduced to represent the determinant of the Hessian, this symmetry can be simplified considerably. Due to the expansion properties of Grassmann integrals, any appearance of $-\bar{\pmb\eta}\pmb\eta^T$ in the integrand resolves to $(\operatorname{Hess}H)^{-1}$. The @@ -1839,7 +1836,7 @@ trace to $\mu$ explicitly breaks this symmetry, and the simplification is lost. \section{Spectral density in the multispherical spin glass} \label{sec:multispherical.spectrum} -In this appendix we derive an expression for the asymptotic spectral density in +In this appendix we derive an expression for the asymptotic spectral density of the Hessian in the two-sphere multispherical spin glass that we describe in Section \ref{sec:multispherical}. We use a typical approach of employing replicas to compute the resolvent \cite{Livan_2018_Introduction}. The resolvent for the @@ -1859,8 +1856,8 @@ y=[\mathbf y^{(1)},\mathbf y^{(2)}]\in\mathbb R^{2N}$ as -\frac12\begin{bmatrix}\mathbf y_a^{(1)}\\\mathbf y_a^{(2)}\end{bmatrix}^T \left( \begin{bmatrix} - \operatorname{Hess}H_1(\mathbf x^{(1)},\omega_1) & -\epsilon \\ - -\epsilon & \operatorname{Hess}H_2(\mathbf x^{(2)},\omega_2) + \partial_1\partial_1H_1(\mathbf x^{(1)})+\omega_1I & -\epsilon I \\ + -\epsilon I & \partial_2\partial_2H_2(\mathbf x^{(2)})+\omega_2I \end{bmatrix} -\lambda I \right)\begin{bmatrix}\mathbf y_a^{(1)}\\\mathbf y_a^{(2)}\end{bmatrix} @@ -1876,21 +1873,21 @@ of overlaps of the vectors $\mathbf y$, then a short and standard calculation in where the effective action $\mathcal S$ is given by \begin{equation} \begin{aligned} - &\mathcal S(Y) - =\lim_{n\to0}\frac1n\left\{ + \mathcal S(Y) + =\lim_{n\to0}\frac1n\Bigg\{ \frac14\sum_{ab}^n\left[ - \sigma_1^2(Y_{ab}^{(11)})^2 - +\sigma_2^2(Y_{ab}^{(22)})^2 + f_1''(1)(Y_{ab}^{(11)})^2 + +f_2''(1)(Y_{ab}^{(22)})^2 \right] +\frac12\sum_a^n\left[ 2\epsilon Y_{aa}^{(12)} +(\lambda-\omega_1)Y_{aa}^{(11)} +(\lambda-\omega_2)Y_{aa}^{(22)} - \right] + \right] \qquad\\ +\frac12\log\det\begin{bmatrix} Y^{(11)}&Y^{(12)}\\Y^{(12)}&Y^{(22)} \end{bmatrix} - \right\} + \Bigg\} \end{aligned} \end{equation} \end{widetext} @@ -1900,8 +1897,8 @@ each of the matrices $Y^{(ij)}$ yields \begin{aligned} \mathcal S(y) &= - \frac14\left[\sigma_1^2(y^{(11)})^2 - +\sigma_2^2(y^{(22)})^2\right]+\epsilon y^{(12)} + \frac14\left[f_1''(1)(y^{(11)})^2 + +f_2''(1)(y^{(22)})^2\right]+\epsilon y^{(12)} \\ & \qquad+\frac12\left[(\lambda-\omega_1)y^{(11)} @@ -1922,20 +1919,20 @@ spectral density at large $N$ is then given by the discontinuity in its imaginary point on the real axis, or \begin{equation} \rho(\lambda) - =\frac1{i\pi N} + =\frac1{2\pi iN} \left( \overline{G(\lambda+i0^+)}-\overline{G(\lambda+i0^-)} \right) \end{equation} -\section{Complexity of dominant optima in the least-squares problem} +\section{Complexity of dominant optima for sums of squared random functions} \label{sec:dominant.complexity} Here we share an outline of the derivation of formulas for the complexity of -dominant optima in the random nonlinear least squares problem of section +dominant optima in sums of squared random functions of section \ref{sec:least.squares}. While in this paper we only treat problems with a replica symmetric structure, formulas for the effective action are generic to -any structure and provide a starting point for analyzing the challenging +any \textsc{rsb} structure and provide a starting point for analyzing the challenging full \textsc{rsb} setting. Using the $\mathbb R^{N|2}$ superfields @@ -1951,7 +1948,7 @@ the replicated count of stationary points can be written N\hat\beta E \\ &\qquad-\frac12\int d1\,\left( B(1)\sum_{k=1}^MV_k(\pmb\phi_a(1))^2 - -\mu\big(\|\pmb\phi_a(1)\|^2-N\big) + -\mu\|\pmb\phi_a(1)\|^2 \right) \bigg] \end{aligned} @@ -1974,7 +1971,7 @@ contribution of \eqref{eq:Vv.delta}, which is linear. The average over the disorder can then be computed, which yields \begin{equation} \begin{aligned} - &\overline{\sum_{k=1}^M\sum_{a=1}^n\exp\left[i\int d1\,\hat v_{ka}(1)V_k(\pmb\phi_a(1))\right]} + &\overline{\exp\left[i\sum_{k=1}^M\sum_{a=1}^n\int d1\,\hat v_{ka}(1)V_k(\pmb\phi_a(1))\right]} \\ & =\exp\left[ |