diff options
author | Jaron Kent-Dobias <jaron@kent-dobias.com> | 2024-06-26 18:09:45 +0200 |
---|---|---|
committer | Jaron Kent-Dobias <jaron@kent-dobias.com> | 2024-06-26 18:09:45 +0200 |
commit | 0e3d2fbcbcbe68a25a6f7fe90b0b251ee6d35f1a (patch) | |
tree | 6bd3988bb727e6b1ab2d1d93bf6c7b88138b8405 /marginal.tex | |
parent | 455e0b9de4e7ff101eb0e290ddcb559be9d1a020 (diff) | |
download | marginal-0e3d2fbcbcbe68a25a6f7fe90b0b251ee6d35f1a.tar.gz marginal-0e3d2fbcbcbe68a25a6f7fe90b0b251ee6d35f1a.tar.bz2 marginal-0e3d2fbcbcbe68a25a6f7fe90b0b251ee6d35f1a.zip |
More work.
Diffstat (limited to 'marginal.tex')
-rw-r--r-- | marginal.tex | 76 |
1 files changed, 40 insertions, 36 deletions
diff --git a/marginal.tex b/marginal.tex index 408a1da..f46ed38 100644 --- a/marginal.tex +++ b/marginal.tex @@ -1236,7 +1236,7 @@ subspace and a stable minimum on the other. corresponding to a marginal spectrum for multispherical spin glasses with $\sigma_1^2=f_1''(1)=1$, $\sigma_2^2=f_2''(1)=1$, and various $\epsilon$. \textbf{(b)}~Spectra corresponding to the parameters $\omega_1$ and - $\omega_2$ marked by the circles on the lefthand plot. + $\omega_2$ marked by the circles in panel (a). \textbf{(c)}~The complexity of marginal minima in a multispherical model with $f_i(q)=q^{p_i}/[p_i(p_i-1)]$ for $p_1=3$ and $p_2=4$ for a variety of $\epsilon$. Since $f_1''(1)=f_2''(1)=1$, the marginal values correspond @@ -1274,23 +1274,28 @@ In this subsection we consider perhaps the simplest example of a non-Gaussian landscape: the problem of random nonlinear least squares optimization. Though, for reasons we will see it is easier to make predictions for random nonlinear \emph{most} squares, i.e., the problem of maximizing the sum of squared terms. -We also take a spherical configuration space with $\mathbf x\in S^{N-1}$ and $0=g(\mathbf x)=\frac12(\|\mathbf x\|^2-N)$ as in the spherical spin glasses, and consider a set +We again take a spherical configuration space with $\mathbf x\in S^{N-1}$ and $0=g(\mathbf x)=\frac12(\|\mathbf x\|^2-N)$ as in the spherical spin glasses, and consider a set of $M$ random functions $V_k:\mathbf S^{N-1}\to\mathbb R$ that are centered Gaussians with covariance \begin{equation} \overline{V_i(\mathbf x)V_j(\mathbf x')}=\delta_{ij}f\left(\frac{\mathbf x\cdot\mathbf x'}N\right) \end{equation} The energy is minus the sum of squares of the $V_k$, or -\begin{equation} +\begin{equation} \label{eq:ls.hamiltonian} H(\mathbf x)=-\frac12\sum_{k=1}^MV_k(\mathbf x)^2 \end{equation} -The landscape complexity and large deviations of the ground state for this problem were recently studied in a linear context, with $f(q)=\sigma^2+aq$ \cite{Fyodorov_2020_Counting, Fyodorov_2022_Optimization}. Some results on the ground state of the general nonlinear problem can also be found in \cite{Tublin_2022_A}. In particular, that work indicates that the low-lying minima of the problem tend to be either replica symmetric or full replica symmetry breaking. This is not good news for our analysis or marginal states, because in the former case the problem is typically easy to solve, and in the latter the analysis becomes much more technically challenging. - -\cite{Urbani_2023_A, Kamali_2023_Dynamical, Kamali_2023_Stochastic, Urbani_2024_Statistical} -\cite{Montanari_2023_Solving, Montanari_2024_On} -\cite{Subag_2020_Following} +The landscape complexity and large deviations of the ground state for the +least-squares version of this problem were recently studied in a linear +context, with $f(q)=\sigma^2+aq$ \cite{Fyodorov_2020_Counting, +Fyodorov_2022_Optimization}. Some results on the ground state of the general +nonlinear problem can also be found in \cite{Tublin_2022_A}, and a solution to +the equilibrium problem can be found in \cite{Urbani_2023_A}. In particular, +that work indicates that the low-lying minima of the least squares problem tend +to be either replica symmetric or full replica symmetry breaking. To avoid +either a trivial analysis or a very complex one, we instead focus on maximizing +the sum of squares, or minimizing \eqref{eq:ls.hamiltonian}. Fortunately, the \emph{maxima} of this problem have a more amenable structure -for study, as they are typically described by 1-RSB like structure. There is a +for study, as they are typically described by {\oldstylenums1}\textsc{rsb}-like structure. There is a heuristic intuition for this: in the limit of $M\to1$, this problem is just the square of a spherical spin glass landscape. The distribution and properties of stationary points low and high in the spherical spin glass are not changed, @@ -1300,17 +1305,26 @@ bottom, however, consists of the zero-energy level set in the spherical spin glass. This level set is well-connected, and so the ground states should also be well connected and flat. -Focusing on the top of the landscape and therefore dealing with a 1-RSB like -problem is good for our analysis. First, algorithms will tend to be stuck in -the ways they are for hard optimization problems, and second we will be able -to explicitly predict where. Therefore, we will study the most squares problem -rather than the least squares one. We calculate the complexity of maxima under a replica symmetric ansatz (which covers 1-RSB like problems) for arbitrary covariance $f$, and then the marginal complexity. +Focusing on the top of the landscape and therefore dealing with a {\oldstylenums1}\textsc{rsb}-like +problem is good for our analysis. Algorithms will tend to be stuck in the ways +they are for hard optimization problems, and we will be able to explicitly +predict where. Therefore, we will study the most squares problem rather than +the least squares one. We calculate the complexity of minima of +\eqref{eq:ls.hamiltonian} in Appendix~\ref{sec:dominant.complexity}, which +corresponds to maximizing the sum of squares, under a replica symmetric ansatz +(which covers {\oldstylenums1}\textsc{rsb}-like problems) for arbitrary covariance $f$, and we +calculate the complexity of marginal minima in this section.. Applying the Lagrange multiplier method detailed above to enforce the spherical constraint, the gradient and Hessian are \begin{align} - \nabla H(\mathbf x,\omega)=\sum_k^MV_k(\mathbf x)\partial V_k(\mathbf x)+\omega\mathbf x + &\nabla H(\mathbf x,\omega) + =-\sum_k^MV_k(\mathbf x)\partial V_k(\mathbf x)+\omega\mathbf x \\ - \operatorname{Hess}H(\mathbf x,\omega)=\partial V_k(\mathbf x)\partial V_k(\mathbf x)+V_k(\mathbf x)\partial\partial V_k(\mathbf x)+\omega I + &\begin{aligned} + &\operatorname{Hess}H(\mathbf x,\omega) \\ + &\qquad=-\sum_k^M\left[\partial V_k(\mathbf x)\partial V_k(\mathbf x) + -V_k(\mathbf x)\partial\partial V_k(\mathbf x)\right]+\omega I + \end{aligned} \end{align} As in the spherical and multispherical models, fixing the trace of the Hessian at largest order in $N$ is equivalent to constraining the value of the Lagrange @@ -1319,31 +1333,18 @@ matrix contribute typical values at a lower order in $N$. The derivation of the marginal complexity for this model is complicated, but can be made schematically like that of the derivation of the equilibrium free -energy by use of superspace coordinates \cite{DeWitt_1992_Supermanifolds}. -The use of superspace coordinates in the geometry and dynamics of disordered -systems is well-established. Here, we introduce a novel extension of the -traditional approach to incorporate the marginality condition. -Consider supervectors in the $\mathbb R^{N|4}$ superspace of the form -\begin{equation} - \pmb\phi_a^\alpha(1,2) - =\mathbf x_a - +\bar\theta_1\pmb\eta_a+\bar{\pmb\eta}_a\theta_1 - +i\hat{\mathbf x}_a\bar\theta_1\theta_1 - +\mathbf s_a^\alpha(\bar\theta_1\theta_2+\bar\theta_2\theta_1) -\end{equation} -The traditional complexity problem, outlined in the appendix -\ref{sec:dominant.complexity}, involves a supervector without the last term. +energy by use of superspace coordinates. Following the framework outlined in +Section~\ref{sec:superspace_kac-rice}, the replicated number of stationary +points conditioned on energy $E$, trace $\mu$, and minimum eigenvalue +$\lambda^*$ is given by \begin{widetext} - Using the notation of Section \ref{sec:superspace_kac-rice}, the replicated - number of stationary points conditioned on energy $E$, trace $\mu$, and - minimum eigenvalue $\lambda^*$ is then given by \begin{equation} \begin{aligned} \mathcal N(E,\mu,\lambda^*)^n - &=\int\prod_{a=1}^n\lim_{m_a\to0}\prod_{\alpha=1}^{m_a}d\pmb\phi_a^\alpha + &=\int d\hat\beta\,d\hat\lambda\prod_{a=1}^n\lim_{m_a\to0}\prod_{\alpha=1}^{m_a}d\pmb\phi_a^\alpha \exp\left\{ - \delta^{\alpha1}N(\hat\beta_aE+\hat\lambda_a\lambda^*) - -\frac12\int d1\,d2\,B_a^\alpha(1,2)\left[\sum_{k=1}^MV_k(\pmb\phi_a^\alpha)^2 + \delta^{\alpha1}N(\hat\beta E+\hat\lambda\lambda^*) + -\frac12\int d1\,d2\,B^\alpha(1,2)\left[\sum_{k=1}^MV_k(\pmb\phi_a^\alpha)^2 -\mu(\|\pmb\phi_a^\alpha\|^2-N)\right] \right\} \end{aligned} @@ -1492,6 +1493,9 @@ taking the zero-temperature limit, we find } \label{fig:ls.complexity} \end{figure} +\cite{Urbani_2023_A, Kamali_2023_Dynamical, Kamali_2023_Stochastic, Urbani_2024_Statistical} +\cite{Montanari_2023_Solving, Montanari_2024_On} + \section{Conclusions} \label{sec:conclusion} |