summaryrefslogtreecommitdiff
path: root/marginal.tex
diff options
context:
space:
mode:
authorJaron Kent-Dobias <jaron@kent-dobias.com>2024-07-24 15:45:40 +0200
committerJaron Kent-Dobias <jaron@kent-dobias.com>2024-07-24 15:45:40 +0200
commit1f32126c4b9ad5852b9cd529647a74c7e1f8f65f (patch)
tree3b10e25ac9c3f423c5004293d053fb118bb88560 /marginal.tex
parent51adb0235c7d0b6826319a099843ba936c22c78e (diff)
downloadmarginal-1f32126c4b9ad5852b9cd529647a74c7e1f8f65f.tar.gz
marginal-1f32126c4b9ad5852b9cd529647a74c7e1f8f65f.tar.bz2
marginal-1f32126c4b9ad5852b9cd529647a74c7e1f8f65f.zip
Added some citations and some text.
Diffstat (limited to 'marginal.tex')
-rw-r--r--marginal.tex68
1 files changed, 30 insertions, 38 deletions
diff --git a/marginal.tex b/marginal.tex
index 4c7b55f..f167ec7 100644
--- a/marginal.tex
+++ b/marginal.tex
@@ -75,7 +75,7 @@ compared to stiff minima or saddle points. This ubiquity of behavior suggests
that the distribution of marginal minima can be used to bound out-of-equilibrium dynamical
behavior.
-It is not straightforward to condition on the marginality of minima using the
+Despite their importance in a wide variety of in and out of equilibrium settings \cite{Muller_2015_Marginal, Anderson_1984_Lectures, Sommers_1984_Distribution, Parisi_1995-01_On, Horner_2007_Time, Pankov_2006_Low-temperature, Erba_2024_Quenches, Efros_1985_Coulomb, Shklovskii_2024_Half}, it is not straightforward to condition on the marginality of minima using the
traditional methods for analyzing the distribution of minima in rugged
landscapes. Using the method of a Legendre transformation of the Parisi
parameter corresponding to a set of real replicas, one can force the result to
@@ -166,9 +166,9 @@ of $A$. This produces
\end{aligned}
\end{equation}
as desired.
-The first relation extends a technique first introduced in
-\cite{Ikeda_2023_Bose-Einstein-like} and later used in
-\cite{Kent-Dobias_2024_Arrangement} in the context of random landscapes. A Boltzmann distribution is introduced
+The first relation extends a technique for calculating the typical minimum eigenvalue of an ensemble of matrices first introduced by
+\citeauthor{Ikeda_2023_Bose-Einstein-like} and later used by
+\citeauthor{Kent-Dobias_2024_Arrangement} in the context of random landscapes, and is similar to an earlier technique for conditioning the value of the ground state energy in random landscapes by \citeauthor{Fyodorov_2013_Topology} \cite{Ikeda_2023_Bose-Einstein-like, Kent-Dobias_2024_Algorithm-independent, Fyodorov_2013_Topology, Fyodorov_2018_Hessian}. A Boltzmann distribution is introduced
over a spherical model whose Hamiltonian is quadratic with interaction matrix
given by $A$. In the limit of zero temperature, the measure will concentrate on
the ground states of the model, which correspond with the eigenspace of $A$
@@ -608,7 +608,7 @@ be constant over the replica index at the eventual saddle point solution.
We would like to make a similar treatment of the determinant of the Hessian
that appears in \eqref{eq:kac-rice.measure}. The standard approach is to drop
the absolute value function around the determinant. This can potentially lead
-to severe problems with the complexity. However, it is a justified step when
+to severe problems with the complexity \cite{Fyodorov_2004_Complexity}. However, it is a justified step when
the parameters of the problem $E$, $\mu$, and $\lambda^*$ put us in a
regime where the exponential majority of stationary points have the same index.
This is true for maxima and minima, and for saddle points whose spectra have a
@@ -768,6 +768,13 @@ well known by simpler methods, it is instructive to carry through the
calculation for this case, since we will learn some things about its application in
more nontrivial settings.
+Note that in the pure version of these models with $f(q)=\frac12q^p$, the
+methods of this section must be amended slightly. This is because in these
+models there is an exact correspondence $\mu=-pE$ between the trace of the
+Hessian and the energy, and therefore they cannot be fixed independently. This
+correspondence implies that when $\mu=\mu_\mathrm m$, the corresponding energy level
+$E_\mathrm{th}=-\frac1p\mu_\mathrm m$ contains all marginal minima. This is what gives this threshold energy such singular importance to dynamics in the pure spherical models.
+
The procedure to treat the complexity of the spherical models has been made in
detail elsewhere \cite{Kent-Dobias_2023_How}. Here we make only a sketch of the
steps involved. First we notice that
@@ -807,7 +814,7 @@ have introduced only through their scalar products with each other. We therefore
&X^\alpha_{ab}=\frac1N\mathbf x_a\cdot\mathbf s_b^\alpha&
\notag \\
&G_{ab}=\frac1N\bar{\pmb\eta}_a\cdot\pmb\eta_b
- \label{eq:order.parameters}
+ \label{eq:order.parameters}
\end{align}
Order parameters that mix the normal and Grassmann variables generically vanish
in these settings and we don't consider them here \cite{Kurchan_1992_Supersymmetry}.
@@ -1150,7 +1157,13 @@ diagonal not necessarily equal to 1, so
\tilde q^{ij}_0 & q^{ij}_0 & q^{ij}_0 & \cdots & q^{ij}_d
\end{bmatrix}
\end{equation}
-This requires us to introduce two new order parameters per pair $(i,j)$. We also need two separate Lagrange multipliers $\hat q$ and $\hat{\tilde q}$ to enforce the tangent space normalization for the tilde and untilde replicas, respectively. When
+This requires us to introduce two new order parameters $\tilde q^{ij}_d$ and
+$q^{ij}_d$ per pair $(i,j)$, in addition to the off-diagonal order parameters
+$\tilde q_0^{ij}$ and $q_0^{ij}$ already present in \eqref{eq:Q.structure}. We
+also need two separate Lagrange multipliers $\hat q$ and $\hat{\tilde q}$ to
+enforce the tangent space normalization $q_d^{11}+q_d^{22}=1$ and $\tilde q_d^{11}+\tilde q_d^{22}=1$ for the tilde and untilde replicas,
+respectively, which will in general take different values at the saddle point.
+When
this ansatz is inserted into the expression \eqref{eq:multispherical.marginal.action} for the effective action and the
limit of $m\to0$ is taken, we find
\begin{widetext}
@@ -1303,8 +1316,8 @@ The total energy is minus the sum of squares of the $V_k$, or
\end{equation}
The landscape complexity and large deviations of the ground state for the
least-squares version of this problem were recently studied in a linear
-context, with $f(q)=\sigma^2+aq$ \cite{Fyodorov_2020_Counting,
-Fyodorov_2022_Optimization}. Some results on the ground state of the general
+context, with $f(q)=\sigma^2+aq$ \cite{Fyodorov_2019_A, Fyodorov_2020_Counting,
+Fyodorov_2022_Optimization, Vivo_2024_Random}. Some results on the ground state of the general
nonlinear problem can also be found in \cite{Tublin_2022_A}, and a solution to
the equilibrium problem can be found in \cite{Urbani_2023_A}.
Those works indicate that the low-lying minima of the least squares problem tend
@@ -1334,9 +1347,11 @@ corresponds to maximizing the sum of squares, under a replica symmetric ansatz
calculate the complexity of marginal minima in this section.
As in the previous sections, we used the method of Lagrange multipliers to analyse stationary points on the constrained configuration space. The Lagrangian and its associated gradient and Hessian are
-\begin{align}
- &L(\mathbf x,\omega)
+\begin{equation}
+ L(\mathbf x,\omega)
=-\frac12\bigg(\sum_k^MV_k(\mathbf x)^2-\omega\big(\|\mathbf x\|^2-N\big)\bigg) \\
+\end{equation}
+\begin{align}
&\nabla H(\mathbf x,\omega)
=-\sum_k^MV_k(\mathbf x)\partial V_k(\mathbf x)+\omega\mathbf x
\\
@@ -1663,27 +1678,6 @@ still in its infancy \cite{Urbani_2024_Statistical}.
\appendix
-\section{Relationship with previous work}
-
-The title of our paper and that of \citeauthor{Muller_2006_Marginal} suggest
-they address the same topic, but this is not the case
-\cite{Muller_2006_Marginal}. That work differs in three important and
-fundamental ways. First, it describes minima of the TAP free energy and
-involves peculiarities specific to the TAP. Second, it describes dominant
-minima, not a condition for finding subdominant marginal minima. Finally, it
-focuses on minima with a single soft direction (which are the typical minima of
-the low temperature Sherrington--Kirkpatrick TAP free energy), while we aim to
-avoid such minima in favor of ones that have a pseudogap (which we argue are relevant
-to out-of-equilibrium dynamics). The fact that the typical minima studied by
-\citeauthor{Muller_2006_Marginal} are not marginal in this latter sense may
-provide an intuitive explanation for the seeming discrepancy between the proof
-that the low-energy Sherrington--Kirkpatrick model cannot be sampled
-\cite{ElAlaoui_2022_Sampling} and the proof that a message passing algorithm
-can find near-ground states \cite{Montanari_2021_Optimization}: the algorithm
-finds the atypical low-lying states that are marginal in the sense considered
-here but cannot find the typical ones that are marginal in the sense of
-\citeauthor{Muller_2006_Marginal}.
-
\section{A primer on superspace}
\label{sec:superspace}
@@ -2052,6 +2046,7 @@ in a change of measure of the form
\prod_{a=1}^n d\pmb\phi_a=d\mathbb Q\,(\operatorname{sdet}\mathbb Q)^\frac N2
=d\mathbb Q\,\exp\left[\frac N2\log\operatorname{sdet}\mathbb Q\right]
\end{equation}
+\begin{widetext}
We therefore have
\begin{equation}
\begin{aligned}
@@ -2060,8 +2055,6 @@ We therefore have
\exp\bigg\{
nN\hat\beta E+N\frac\mu2\operatorname{sTr}\mathbb Q
+\frac N2\log\operatorname{sdet}\mathbb Q
- \\
- &\qquad
-\frac M2\log\operatorname{sdet}\left[
\delta_{ab}\delta(1,2)+B(1)f(\mathbb Q_{ab}(1,2))
\right]
@@ -2073,8 +2066,8 @@ have from the definition of $\pmb\phi$ and $\mathbb Q$ that
\begin{equation}
\begin{aligned}
&\mathbb Q_{ab}(1,2)
- =C_{ab}-G_{ab}(\bar\theta_1\theta_2+\bar\theta_2\theta_1) \\
- &\qquad-R_{ab}(\bar\theta_1\theta_1+\bar\theta_2\theta_2)
+ =C_{ab}-G_{ab}(\bar\theta_1\theta_2+\bar\theta_2\theta_1)
+ -R_{ab}(\bar\theta_1\theta_1+\bar\theta_2\theta_2)
-D_{ab}\bar\theta_1\theta_2\bar\theta_2\theta_2
\end{aligned}
\end{equation}
@@ -2087,7 +2080,6 @@ the expression above and evaluating the superdeterminants and supertrace, we fin
\mathcal N(E,\mu)^n=\int d\hat\beta\,dC\,dR\,dD\,dG\,e^{nN\mathcal S_\mathrm{KR}(\hat\beta,C,R,D,G)}
\end{equation}
where the effective action is given by
-\begin{widetext}
\begin{equation}
\begin{aligned}
\mathcal S_\mathrm{KR}(\hat\beta,C,R,D,G)
@@ -2145,7 +2137,6 @@ $\hat\beta$, $c_0$, $r$, and $r_0$ is
\right]
\end{aligned}
\end{equation}
-\end{widetext}
When $f(0)=0$ as in the cases directly studied in this work, this further
simplifies as $c_0=r_0=0$. The effective action is then
\begin{equation}
@@ -2159,6 +2150,7 @@ simplifies as $c_0=r_0=0$. The effective action is then
Extremizing this expression with respect to the
order parameters $\hat\beta$ and $r$ produces the red line of dominant minima
shown in Fig.~\ref{fig:ls.complexity}.
+\end{widetext}
\bibliography{marginal}