diff options
author | Jaron Kent-Dobias <jaron@kent-dobias.com> | 2024-06-12 22:15:18 +0200 |
---|---|---|
committer | Jaron Kent-Dobias <jaron@kent-dobias.com> | 2024-06-12 22:15:18 +0200 |
commit | abf30e8937043b05c74d12b665ac541aa269dca3 (patch) | |
tree | 03ee0709a8e5ddfcc3701dd9de81706878b79166 /marginal.tex | |
parent | 1a2dc62a5e75c94e3cfc7da8334061743ba70d05 (diff) | |
download | marginal-abf30e8937043b05c74d12b665ac541aa269dca3.tar.gz marginal-abf30e8937043b05c74d12b665ac541aa269dca3.tar.bz2 marginal-abf30e8937043b05c74d12b665ac541aa269dca3.zip |
Mostly writing on the multispherical model.
Diffstat (limited to 'marginal.tex')
-rw-r--r-- | marginal.tex | 160 |
1 files changed, 128 insertions, 32 deletions
diff --git a/marginal.tex b/marginal.tex index 0eb6e5b..3766757 100644 --- a/marginal.tex +++ b/marginal.tex @@ -22,20 +22,23 @@ } \author{Jaron Kent-Dobias} -\affiliation{Istituto Nazionale di Fisica Nucleare, Sezione di Roma I, Rome, Italy 00184} +\affiliation{ + Istituto Nazionale di Fisica Nucleare, Sezione di Roma I, Rome, Italy 00184 +} \begin{abstract} - Marginal optima are minima or maxima of a function with many nearly - flat directions. In settings with many competing optima, marginal ones tend - to attract algorithms and physical dynamics. Often, the important family of + Marginal optima are minima or maxima of a function with many nearly flat + directions. In settings with many competing optima, marginal ones tend to + attract algorithms and physical dynamics. Often, the important family of marginal attractors are a vanishing minority compared with nonmarginal optima and other unstable stationary points. We introduce a generic technique for conditioning the statistics of stationary points on their marginality, and - apply it in three isotropic settings with qualitatively different structure: in the spherical spin-glasses, where the Hessian is GOE; - in a multispherical spin glasses, which are Gaussian but non-GOE; and in a - model of random nonlinear sum of squares, which is non-Gaussian. In these - problems we are able to fully characterize the distribution of marginal - optima in the landscape, including when they are in the minority. + apply it in three isotropic settings with qualitatively different structure: + in the spherical spin-glasses, where the Hessian is GOE; in a multispherical + spin glasses, which are Gaussian but non-GOE; and in a model of random + nonlinear sum of squares, which is non-Gaussian. In these problems we are + able to fully characterize the distribution of marginal optima in the + landscape, including when they are in the minority. \end{abstract} \maketitle @@ -96,7 +99,7 @@ more useful. \section{Conditioning on the smallest eigenvalue} - +\subsection{The general method} An arbitrary function $g$ of the minimum eigenvalue of a matrix $A$ can be expressed as @@ -822,6 +825,7 @@ that the marginal complexity in these models is simply the ordinary complexity evaluated at a fixed trace of the Hessian. \subsection{Multispherical spin glasses} +\label{sec:multispherical} The multispherical models are a simple extension of the spherical ones, where the configuration space is taken to be the union of more than one hypersphere. @@ -829,15 +833,15 @@ Here we consider the specific case where the configuration space is the union of two $(N-1)$-spheres, with $\Omega=S^{N-1}\times S^{N-1}$, and where the energy is given by \begin{equation} - H(\mathbf x)=H_1(\mathbf x^{(1)})+H_2(\mathbf x^{(2)})+\epsilon\mathbf x^{(1)}\cdot\mathbf x^{(2)} + H(\mathbf x)=H_1(\mathbf x^{(1)})+H_2(\mathbf x^{(2)})-\epsilon\mathbf x^{(1)}\cdot\mathbf x^{(2)} \end{equation} for $\mathbf x=[\mathbf x^{(1)},\mathbf x^{(2)}]$ for components $\mathbf x^{(1)},\mathbf x^{(2)}\in\mathbb R^N$. Each individual sphere energy $H_s$ is taken to be a centered Gaussian random function with a covariance given in the usual spherical way by \begin{equation} - \overline{H_s(\pmb\sigma_1)H_p(\pmb\sigma_2)} - =N\delta_{sp}f_s\left(\frac{\pmb\sigma_1\cdot\pmb\sigma_2}N\right) + \overline{H_i(\pmb\sigma_1)H_j(\pmb\sigma_2)} + =N\delta_{ij}f_i\left(\frac{\pmb\sigma_1\cdot\pmb\sigma_2}N\right) \end{equation} with the functions $f_1$ and $f_2$ not necessarily the same. In this problem, there is an energetic competition between the independent spin glass energies @@ -850,20 +854,20 @@ the previous example of the spherical models, the spectrum of the Hessian at different points in the configuration space has different shapes. This appears in this problem through the presence of a configuration space defined by multiple constraints, and therefore multiple Lagrange multipliers are necessary -to ensure they are all fixed. +to ensure they are all fixed. The resulting Lagrangian, gradient, and Hessian are \begin{align} - H(\mathbf x) - +\frac12\omega^{(1)}\big(\|\mathbf x^{(1)}\|^2-N\big) - +\frac12\omega^{(2)}\big(\|\mathbf x^{(2)}\|^2-N\big) + L(\mathbf x)&=H(\mathbf x) + +\frac12\omega^{(1)}\big(\|\mathbf x^{(1)}\|^2-N\big) \\ + &\qquad\qquad\qquad+\frac12\omega^{(2)}\big(\|\mathbf x^{(2)}\|^2-N\big) \\ \nabla H(\mathbf x,\pmb\omega) - =\partial H(\mathbf x)+\begin{bmatrix} + &=\partial H(\mathbf x)+\begin{bmatrix} \omega^{(1)}\mathbf x^{(1)} \\ \omega^{(2)}\mathbf x^{(2)} \end{bmatrix} \\ \operatorname{Hess}H(\mathbf x,\pmb\omega) - =\partial\partial H(\mathbf x)+\begin{bmatrix} + &=\partial\partial H(\mathbf x)+\begin{bmatrix} \omega^{(1)}I&0 \\ 0&\omega^{(2)}I \end{bmatrix} @@ -873,24 +877,35 @@ equivalent to a constraint on the Lagrange multipliers. However, in this case it corresponds to $\mu=\omega^{(1)}+\omega^{(2)}$, and therefore they are not uniquely fixed by the trace. -\begin{widetext} -\begin{equation} - \mathcal S(C,R,D,W,\hat\beta,\omega) - =\frac12\frac1n - \sum_{ab}\left( - \hat\beta^2f(C_{ab})+(2\hat\beta R_{ab}-D_{ab})f'(C_{ab})+(R_{ab}^2-W_{ab}^2)f''(C_{ab}) - \right) -\end{equation} +Since the energy in the multispherical models is Gaussian, the properties of +the matrix $\partial\partial H$ are again independent of the energy and +gradient. This means that the form of the Hessian is parameterized solely by +the values of the Lagrange multipliers $\omega^{(1)}$ and $\omega^{(2)}$, just +as $\mu=\omega$ alone parameterized the Hessian in the spherical spin glasses. +Unlike that case, however, the Hessian takes different shapes with different +spectral widths depending on their precise combination. +\begin{widetext} \begin{equation} \begin{aligned} - &\mathcal S(C^{11},R^{11},D^{11},W^{11},\hat\beta)+\mathcal S(C^{22},R^{22},D^{22},W^{22},\hat\beta) - -\epsilon(r_{12}+r_{21})-\omega_1(r^{11}_d-w^{11}_d)-\omega_2(r^{22}_d-w^{22}_d)+\hat\beta E \\ - &+\frac12\log\det\begin{bmatrix}C^{11}&iR^{11}\\iR^{11}&D^{11}\end{bmatrix} + &\mathcal S_\mathrm{MSG}(\hat\beta,C^{11},R^{11},D^{11},G^{11},C^{22},R^{22},D^{22},G^{22})= \\ + &\quad + \mathcal S_\mathrm{SSG}(\hat\beta,C^{11},R^{11},D^{11},G^{11}\mid E_1,\omega_1) + +\mathcal S_\mathrm{SSG}(\hat\beta,C^{22},R^{22},D^{22},G^{22}\mid E_2,\omega_2) + -\epsilon(r_{12}+r_{21})+\hat\beta(E-E_1-E_2) \\ + &\quad +\frac12\log\det\left( - \begin{bmatrix}C^{22}-q_{12}^2C^{11}&iR^{22}\\iR^{22}&D^{22}\end{bmatrix} + I+ + \begin{bmatrix}C^{11}&iR^{11}\\iR^{11}&D^{11}\end{bmatrix}^{-1} + \begin{bmatrix} + C^{12} & iR^{12} \\ iR^{21} & D^{12} + \end{bmatrix} + \begin{bmatrix}C^{22}&iR^{22}\\iR^{22}&D^{22}\end{bmatrix}^{-1} + \begin{bmatrix} + C^{12} & iR^{21} \\ iR^{21} & D^{12} + \end{bmatrix} \right) - -\log\det(W^{11}W^{22}+W^{12}W^{21}) + -\log\det(I+(G^{11}G^{22})^{-1}G^{12}G^{21}) \end{aligned} \end{equation} @@ -1333,6 +1348,87 @@ These identities establish $G_{ab}=-R_{ab}$ and $D_{ab}=\hat\beta R_{ab}$, allowing elimination of the matrices $G$ and $D$ in favor of $R$. Fixing the trace to $\mu$ explicitly breaks this symmetry, and the simplification is lost. +\section{Spectral density in the multispherical spin glass} +\label{sec:multispherical.spectrum} + +In this appendix we derive an expression for the asymptotic spectral density in +the two-sphere multispherical spin glass that we describe in Section +\ref{sec:multispherical}. \cite{Livan_2018_Introduction} +\begin{equation} + \begin{aligned} + &G(\lambda) + =\lim_{n\to0}\int\|\mathbf y_1\|^2\,\prod_{a=1}^nd\mathbf y_a\, + \exp\left\{ + -\frac12\mathbf y_a^T(\operatorname{Hess}H(\mathbf x,\pmb\omega)+\lambda I)\mathbf y_a + \right\} \\ + & + =\lim_{n\to0}\int\big(\|\mathbf y_1^{(1)}\|^2+\|\mathbf y_1^{(2)}\|^2\big)\,\prod_{a=1}^nd\mathbf y_a\, \\ + &\times\exp\left\{ + -\frac12\begin{bmatrix}\mathbf y_a^{(1)}\\\mathbf y_a^{(2)}\end{bmatrix}^T + \left( + \begin{bmatrix} + \operatorname{Hess}H_1(\mathbf x^{(1)},\omega_1) & -\epsilon \\ + -\epsilon & \operatorname{Hess}H_2(\mathbf x^{(2)},\omega_2) + \end{bmatrix} + +\lambda I + \right)\begin{bmatrix}\mathbf y_a^{(1)}\\\mathbf y_a^{(2)}\end{bmatrix} + \right\} \\ + \end{aligned} +\end{equation} +If $Y_{ab}^{(ik)}=\frac1N\mathbf y_a^{(i)}\cdot\mathbf y_b^{(j)}$ is the matrix +of overlaps of the $\mathbf y$, then a short and standard calculation yields +\begin{equation} + G(\lambda)=N\lim_{n\to0}\int dY\,(Y_{11}^{(11)}+Y_{11}^{(22)})\, + e^{nN\mathcal S(Y)} +\end{equation} +for +\begin{equation} + \begin{aligned} + &\mathcal S(Y) + =\frac1n\sum_{ab}\left[ + \sigma_1^2(Y_{ab}^{(11)})^2 + +\sigma_2^2(Y_{ab}^{(22)})^2 + \right]+\frac12\log\det\begin{bmatrix} + Y^{(11)}&Y^{(12)}\\Y^{(12)}&Y^{(22)} + \end{bmatrix}\\ + &+\frac1n\sum_a^n\left[ + 2\epsilon Y_{aa}^{(12)} + -\omega_1Y_{aa}^{(11)} + -\omega_2Y_{aa}^{(22)} + +\lambda(Y_{aa}^{(11)} + +Y_{aa}^{(22)}) + \right] + \end{aligned} +\end{equation} +Making the replica symmetric ansatz $Y_{ab}^{(ij)}=y^{(ij)}\delta_{ab}$ yields +\begin{equation} + \begin{aligned} + &\mathcal S(y) + = + \sigma_1^2(y^{(11)})^2 + +\sigma_2^2(y^{(22)})^2 + +\frac12\log( + y^{(11)}y^{(22)}-y^{(12)}y^{(12)} + )\\ + &+2\epsilon y^{(12)} + -\omega_1y^{(11)} + -\omega_2y^{(22)} + +\lambda(y^{(11)} + +y^{(22)}) + \end{aligned} +\end{equation} +\begin{equation} + \overline{G(\lambda)} + =N(y^{(11)}+y^{(22)}) +\end{equation} +\begin{equation} + \rho(\lambda) + =\frac1{i\pi N} + \left( + \overline{G(\lambda+i0^+)}-\overline{G(\lambda+i0^-)} + \right) +\end{equation} + \section{Complexity of dominant optima in the least-squares problem} \label{sec:dominant.complexity} |