when_annealed.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323

\documentclass[fleqn,a4paper]{article}

\usepackage[utf8]{inputenc} % why not type "Bézout" with unicode?
\usepackage[T1]{fontenc} % vector fonts plz
\usepackage{fullpage,amsmath,amssymb,latexsym,graphicx}
\usepackage{newtxtext,newtxmath} % Times for PR
\usepackage{appendix}
\usepackage[dvipsnames]{xcolor}
\usepackage[
  colorlinks=true,
  urlcolor=MidnightBlue,
  citecolor=MidnightBlue,
  filecolor=MidnightBlue,
  linkcolor=MidnightBlue
]{hyperref} % ref and cite links with pretty colors
\usepackage[
  style=phys,
  eprint=true,
  maxnames = 100
]{biblatex}
\usepackage{anyfontsize,authblk}

\addbibresource{when_annealed.bib}

\begin{document}

\title{
  When is the average number of saddles typical?
}

\author{Jaron Kent-Dobias}
\affil{Istituto Nazionale di Fisica Nucleare, Sezione di Roma I}

\maketitle
\begin{abstract}
  A common measure of the complexity of a function is the count of its
  stationary points. For complicated functions,
  this count often grows exponentially with the volume and dimension of their
  domain. In practice, the count is averaged over a class of such functions
  (the annealed average), but the large numbers involved can result in averages
  biased by extremely rare samples. Typical counts are reliably found by taking
  the average of the logarithm (the quenched average), which is more costly and
  not often done in practice. When most stationary points are uncorrelated with each other, quenched and anneals averages are equal. There are heuristics from equilibrium calculations that guarantee when most of the lowest minima will be uncorrelated. Here, we show that these equilibrium
  heuristics cannot be used to draw conclusions about other minima and saddles. We produce examples among Gaussian-correlated functions on the
  hypersphere where the count of certain saddles and minima has different quenched and
  annealed averages, despite being guaranteed `safe' in the equilibrium
  setting. We produce the necessary conditions for the emergence of nontrivial correlations between saddles. We discuss the implications for the geometry of those functions, and
  in what out-of-equilibrium settings might show a signature of this.
\end{abstract}

Random energies, cost functions, and interaction networks are important to many
areas of modern science. The energy landscape glasses, the likelihood landscape
of machine learning and high-dimensional inference, and the interactions
between organisms in an ecosystem are just a few examples. A traditional tool
for making sense of the diverse behavior these systems exhibit is to analyze
the statistics of points where their dynamics is stationary. For energy or cost
landscapes, these correspond to the minima, maxima, and saddles of the
function, while for ecosystems and other purely dynamical systems these
correspond to the equilibria of the dynamics. When many stationary points are
present, the system is considered complex.

Despite the importance of stationary point statistics for understanding complex
behavior, they are often calculated using an uncontrolled approximation.
Because their number is so large, it cannot reliably be averaged over disorder
in the system. The annealed approximation takes this average anyway, risking a
systematic bias by rare and atypical samples. The anneal approximation is known
to be exact for certain models and in certain circumstances, but it is widely
used outside those circumstances without much reflection. In a few cases, the
controlled quenched average, which averages the logarithm of their number, has
been considered \cite{Ros_2019_Complexity, Kent-Dobias_2023_How, Ros_2023_Quenched}.

One heuristic line of reasoning for the correctness of the annealed
approximation for the statistics of stationary points is sometimes made when
the annealed approximation is correct for an equilibrium calculation on the same
system. The argument goes like this: since the limit of zero temperature or
noise in an equilibrium calculation concentrates the measure onto the lowest
set of minima, the equilibrium average at very low temperature should be
governed by the same statistics as the count of that lowest set of minima. This
argument is valid, but only for the lowest set of minima, which at least in
glassy problems are rarely relevant to dynamical behavior. What about the
\emph{rest} of the stationary points?

In this paper, we show that the behavior of the ground state, or \emph{any}
equilibrium behavior, does not govern whether stationary points will have a
correct annealed average. In a prototypical family of models of random
functions, we calculate the conditions for when annealed averages should fail
and stationary points should have nontrivial correlations in their mutual
position. We produce examples of models whose equilibrium is guaranteed to
never see correlations between states, but where a population of saddle points
is correlated.

We study classes of Gaussian-correlated random functions with isotropic
statistics on the $(N-1)$-sphere. Each class of functions $H:S^{N-1}\to\mathbb
R$ is defined by the average covariance between the function evaluated at two
different points $\pmb\sigma_1,\pmb\sigma_2\in S^{N-1}$, which is a function of
the scalar product or overlap between the two configurations:
\begin{equation} \label{eq:covariance}
  \overline{H(\pmb\sigma_1)H(\pmb\sigma_2)}=\frac1Nf\bigg(\frac{\pmb\sigma_1\cdot\pmb\sigma_2}N\bigg)
\end{equation}
Specifying the covariance function $f$ uniquely specifies the class of functions. The
series coefficients of $f$ need to be nonnnegative in order for $f$ to be a
well-defined covariance. The case where $f$ is a homogeneous polynomial has
been extensively studied, and corresponds to the pure spherical models of glass
physics or the spiked tensor models of statistical inference. Here we will
study cases where $f(q)=\frac12\big(\lambda q^p+(1-\lambda)q^s\big)$ for
$\lambda\in(0,1)$. These are examples of \emph{mixed} spherical models.

These classes of functions have been extensively studied in the physics and
statistics literature, and host a zoo of complex orders and phase transitions
\cite{Crisanti_2004_Spherical, Crisanti_2006_Spherical, Crisanti_2011_Statistical}.

There are two important features which differentiate stationary points in the
spherical models: their \emph{energy density} $E=\frac1NH(\pmb\sigma^*)$ and
their \emph{stability}
$\mu=\frac1N\operatorname{\mathrm{Tr}}\operatorname{\mathrm{Hess}}H(\pmb\sigma^*)$.
The energy density should be familiar, as the `height' in the landscape. The
stability is so-called because it governs the spectrum of the stationary point.
In each spherical model, the spectrum of each stationary point is a Wigner
semicircle of the same width $\mu_\mathrm m=\sqrt{4f''(1)}$, but shifted by
constant. The stability $\mu$ sets this constant shift. When $\mu<\mu_\mathrm
m$, the spectrum still has support over zero and we have saddles with an
extensive number of downward directions. When $\mu>\mu_\mathrm m$ the spectrum
has support only over positive eigenvalues, and we have stable minima. When
$\mu=\mu_\mathrm m$, the spectrum has a pseudogap, and we have marginal minima.

\begin{figure}
  \centering
  \includegraphics{figs/phases_34.pdf}
  \caption{
    A phase diagram of the boundaries we discuss in this paper for the $3+s$
    model with $f=\frac12\big(\lambda q^3+(1-\lambda)q^s\big)$. The blue region
    shows where there exist some stationary points whose complexity is
    {\oldstylenums1}\textsc{rsb}. The yellow region shows where $f$ is not
    convex and therefore \textsc{rsb} solutions are possible in equilibrium.
    The green region shows where \textsc{rsb} solutions are correct at
    the ground state, adapted from \cite{Auffinger_2022_The}.
  } \label{fig:phases}
\end{figure}

The number $\mathcal N(E,\mu)$ of stationary points with energy density $E$ and
stability $\mu$ is exponential in $N$ for these models. The complexity is
defined by the average of the logarithm of this, or
$\Sigma(E,\mu)=\frac1N\overline{\log\mathcal N(E,\mu)}$. More often the annealed
complexity is calculated, where the average is taken before the logarithm:
$\Sigma_\mathrm a(E,\mu)=\frac1N\log\overline{\mathcal N(E,\mu)}$.

When the complexity is calculated using the Kac--Rice formula and a physicists'
tool set, the problem is reduced to the evaluation of an integral by the saddle
point method for large $N$ \cite{Kent-Dobias_2023_How}. The complexity is given
by extremizing an effective action,
$\Sigma(E,\mu)=\mathop{\mathrm{extremum}}_{q_1,x}\mathcal S(q_1,x\mid E,\mu)$
for the action $\mathcal S$ given by
\begin{equation}
  \begin{aligned}
    \mathcal S&(q_1,x\mid E,\mu)
    =\mathcal D(\mu)
    +\mathop{\textrm{extremum}}_{\hat\beta,r_\mathrm d,r_1,d_\mathrm d,d_1}
    \Bigg\{
      \hat\beta E-r_\mathrm d\mu\\
      &+\frac12\bigg[
        \hat\beta^2\big[f(1)-\Delta xf(q_1)\big]
        +(2\hat\beta r_\mathrm d-d_\mathrm d)f'(1)
        -\Delta x(2\hat\beta r_1-d_1)f'(q_1)
        +r_\mathrm d^2f''(1)-\Delta x\,r_1^2f''(q_1) \\
        &+\log\Big(
          \big(r_\mathrm d-\Delta x\,r_1\big)^2+d_\mathrm d\big(1-\Delta x\,q_1\big)-\Delta x\,d_1\big(1-\Delta xq_1\big)
          \Big)
          -\frac{\Delta x}x\log\Big(
            (r_\mathrm d-r_1)^2+d_\mathrm d\big(1-\Delta xq_1\big)
          \Big)
      \bigg]
    \Bigg\}
  \end{aligned}
\end{equation}
where $\Delta x=1-x$ and
\begin{equation}
  \mathcal D(\mu)
  =\begin{cases}
    \frac12+\log\left(\frac12\mu_\text m\right)+\frac{\mu^2}{\mu_\text m^2}
     & \mu^2\leq\mu_\text m^2 \\
    \frac12+\log\left(\frac12\mu_\text m\right)+\frac{\mu^2}{\mu_\text m^2}
    -\left|\frac{\mu}{\mu_\text m}\right|\sqrt{\big(\frac\mu{\mu_\text m}\big)^2-1}
    -\log\left(\left|\frac{\mu}{\mu_\text m}\right|-\sqrt{\big(\frac\mu{\mu_\text m}\big)^2-1}\right) & \mu^2>\mu_\text m^2
  \end{cases}
\end{equation}
The extremal problem is quadratic in $\hat\beta$, $r_\mathrm d$, $r_1$,
$d_\mathrm d$, and $d_1$ and therefore its solution is unique and can be found explicitly, but the
resulting formula is much more complicated so we do not include it here. There
can be multiple extrema at which to evaluate $\mathcal S$, in this case the one
for which $\Sigma$ is \emph{smallest} gives the correct solution. There is
always a solution for $x=1$ which is independent of $q_1$, which corresponds to
the replica symmetric case and which is equal to the annealed calculation. The
crux of this paper will be to determine when this solution is not the global
one.

where we define for brevity (here and elsewhere) the constants
\begin{align}
  u_f=f(f'+f'')-f'^2
  &&
  v_f=f'(f''+f''')-f''^2 \\
  w_f=2f''^2+f'(f'''-2f'')
  &&
  y_f=f'(f'-f)+f''f
  &&
  z_f=f(f''-f')+f'^2
\end{align}
It isn't accurate to say that a solution to the saddle point equations is
`stable' or `unstable.' The problem of solving the complexity in this way is
not a variational problem, so there is nothing to be maximized or minimized,
and in general even global solutions have positive and negative eigenvalues of
the Hessian. However, the eigenvalues of the Hessian can still tell us
something about the emergence of new solutions: when another solution
bifurcates smoothly from an existing one, the Hessian evaluated at that point
will have a zero eigenvalue. Unfortunately this is a difficult procedure to
apply in general, since on must know the parameters in the new solution, and
some parameters, e.g., $q_1$, are unconstrained in the old solution.

There is one place where one can consistently search for a bifurcating solution
to the saddle point equations: along the zero complexity line
$\Sigma(E,\mu)=0$. Going along this line in the replica symmetric solution, the
{\oldstylenums1}\textsc{rsb} complexity transitions at a critical point where
$x=q_1=1$ \cite{Kent-Dobias_2023_How}. Since all the parameters in the
bifurcating solution are known at this point, one can search for it by looking
for a zero eigenvalue in the way described above. In the replica symmetric
solution for points describing saddles, this line is
\begin{equation} \label{eq:extremal.line}
  \mu=-\frac1{z_f}\left(-2Ef'f''+\sqrt{-2f''u_f\big(E^2(f''-f')-\log\frac{f''}{f'}z_f\big)}\right)
\end{equation}
Let $M$ be the matrix of double partial derivatives of $\mathcal S$ with
respect to $q_1$ and $x$. We evaluate $M$ at the replica symmetric saddle point
$x=1$ with the additional constraint that $q_1=1$ and along the extremal
complexity line \eqref{eq:extremal.line}. We determine when a zero eigenvalue
appears, indicated the presence of a bifurcating {\oldstylenums1}\textsc{rsb}
solution, by solving $0=\det M$. We find
\begin{equation}
  \det M=-\left(\frac{\partial^2\mathcal S}{\partial q_1\partial x}\bigg|_{\substack{x=1\\q_1=1}}\right)^2\propto(ay^2+bE^2+cyE+d)^2
\end{equation}
where $y^2=eE^2+g$ and the constants $a$, $b$, $c$, $d$, $e$, and $g$ are defined by
\begin{equation}
  \begin{aligned}
    a&=\frac{f''}{u_f}\left[
      \big(3y_f^2-4ff'f''(f'-f)\big)f'''-8f(f'-f)f''^2(f''-f')
    \right]
    \qquad
      b=2f'f''^2w_f
      \\
      c&=2w_f\sqrt{2f''^3u_f}
      \qquad
      d=-\frac{2f''}{f'}z_f^2w_f
      \qquad
      e=-(f''-f')
      \qquad
      g=z_f\log\frac{f''}{f'}
  \end{aligned}
\end{equation}
The solutions for $\det M=0$ correspond to energies that satisfy
\begin{equation}
  E_{\oldstylenums1\textsc{rsb}}
  =-\sqrt{\frac12\frac{c^2g-2(ade+a^2eg+b(d+ag))\pm|c|\sqrt{4d^2e-4d(b-ae)g+(c^2-4ab)g^2}}{b^2+2abe+e(a^2e-c^2)}}
\end{equation}
The expression inside the inner square root is proportional to
\begin{equation}
  \begin{aligned}
    G_f
    &=
    2(f''-f')u_fw_f
    -2\log^2\frac{f''}{f'}f'^2f''v_f
    \\
    &\qquad
    -f'\log\frac{f''}{f'}\Big[
      4(f'-2f)(f''-f')f''^2-\big(-3(f'-f)f'^2+f'(f'-2f)f''+3ff''^2\big)f'''
    \Big]
  \end{aligned}
\end{equation}
If $G_f>0$, then there are two points along the extremal complexity line where
a solution bifurcates, and a new line of {\oldstylenums1}\textsc{rsb} solutions
between them. Therefore, $G_f>0$ is a necessary condition to see
{\oldstylenums1}\textsc{rsb} in the complexity.

\begin{figure}
  \centering
  \includegraphics{figs/complexity_35.pdf}

  \caption{
    Stationary point statistics as a function of energy density $E$ and
    stability $\mu$ for a $3+5$ model with $\lambda=\frac12$. The dashed
    black line shows the line of zero complexity, where stationary points
    vanish, and enclosed inside they are found in exponential number. The red
    region (blown up in the inset) shows where the annealed complexity gives
    the wrong count and a {\oldstylenums1}\textsc{rsb} complexity in necessary.
    The red points show where $\det M=0$. The gray shaded region highlights the
    minima, which are stationary points with $\mu>\mu_\mathrm m$.
  } \label{fig:complexity_35}
\end{figure}

\begin{equation}
  \mu
  =-\frac{(f_1'+f_0'')u_f}{(2f_1-f_1')f_1'f_0''^{1/2}}
  -\frac{f_1''-f_1'}{f_1'-2f_1}E
\end{equation}

There are implications for the emergence of \textsc{rsb} in equilibrium. Consider a specific $H$ with
\begin{equation}
  H(\pmb\sigma)
  =\frac{\sqrt\lambda}{p!}\sum_{i_1\cdots i_p}J^{(p)}_{i_1\cdots i_p}\sigma_{i_1}\cdots\sigma_{i_2}
  +\frac{\sqrt{1-\lambda}}{s!}\sum_{i_1\cdots i_s}J^{(s)}_{i_1\cdots i_s}\sigma_{i_1}\cdots\sigma_{i_s}
\end{equation}
where the interaction tensors $J$ are drawn from zero-mean normal distributions
with $\overline{(J^{(p)})^2}=p!/2N^{p-1}$ and likewise for $J^{(s)}$. It is
straightforward to confirm that $H$ defined this way has the covariance
property \eqref{eq:covariance} with $f(q)=\frac12\big(\lambda
q^p+(1-\lambda)q^s\big)$. With the $J$s drawn in this way and fixed for $p=3$
and $s=16$, we can vary $\lambda$, and according to Fig.~\ref{fig:phases} we
should see a transition in the type of order at the ground state. What causes
the change? Our analysis indicates that stationary points with the required
order \emph{already exist in the landscape} as unstable saddles for small
$\lambda$, then eventually stabilize into metastable minima and finally become
the lowest lying states. This is different from the picture of existing
uncorrelated low-lying states splitting apart into correlated clusters.

\printbibliography

\end{document}