summaryrefslogtreecommitdiff
path: root/when_annealed.tex
blob: b92e9a5bfeb32179e3e76ef3a41f5eeed5e6081b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
\documentclass[fleqn,a4paper]{article}

\usepackage[utf8]{inputenc} % why not type "Bézout" with unicode?
\usepackage[T1]{fontenc} % vector fonts plz
\usepackage{fullpage,amsmath,amssymb,latexsym,graphicx}
\usepackage{newtxtext,newtxmath} % Times for PR
\usepackage{appendix}
\usepackage[dvipsnames]{xcolor}
\usepackage[
  colorlinks=true,
  urlcolor=MidnightBlue,
  citecolor=MidnightBlue,
  filecolor=MidnightBlue,
  linkcolor=MidnightBlue
]{hyperref} % ref and cite links with pretty colors
\usepackage[
  style=phys,
  eprint=true,
  maxnames = 100
]{biblatex}
\usepackage{anyfontsize,authblk}

\addbibresource{when_annealed.bib}

\begin{document}

\title{
  When is the average number of saddles in a function typical?
}

\author{Jaron Kent-Dobias}
\affil{Istituto Nazionale di Fisica Nucleare, Sezione di Roma I}

\maketitle
\begin{abstract}
  A common measure of the complexity of a function is the count of its
  stationary points. For complicated functions,
  this count often grows exponentially with the volume and dimension of their
  domain. In practice, the count is averaged over a class of such functions
  (the annealed average), but the large numbers involved can result in averages
  biased by extremely rare samples. Typical counts are reliably found by taking
  the average of the logarithm (the quenched average), which is more costly and
  not often done in practice. When most stationary points are uncorrelated with each other, quenched and anneals averages are equal. There are heuristics from equilibrium calculations that guarantee when most of the lowest minima will be uncorrelated. Here, we show that these equilibrium
  heuristics cannot be used to draw conclusions about other minima and saddles. We produce examples among Gaussian-correlated functions on the
  hypersphere where the count of certain saddles and minima has different quenched and
  annealed averages, despite being guaranteed `safe' in the equilibrium
  setting. We produce the necessary conditions for the emergence of nontrivial correlations between saddles. We discuss the implications for the geometry of those functions, and
  in what out-of-equilibrium settings might show a signature of this.
\end{abstract}

Random energies, cost functions, and interaction networks are important to many
areas of modern science. The energy landscape of glasses, the likelihood
landscape of machine learning and high-dimensional inference, and the
interactions between organisms in an ecosystem are just a few examples. A
traditional tool for making sense of their behavior is to analyze the
statistics of points where their dynamics are stationary. For energy or cost
landscapes, these correspond to the minima, maxima, and saddles, while for
ecosystems and other non-gradient dynamical systems these correspond to
equilibria of the dynamics. When many stationary points are present, the system
is considered complex.

Despite the importance of stationary point statistics for understanding complex
behavior, they are often calculated using an uncontrolled approximation.
Because their number is so large, it cannot be reliably averaged. The annealed
approximation takes this average anyway, risking a systematic bias by rare and
atypical samples. The annealed approximation is known to be exact for certain
models and in certain circumstances, but it is used outside those circumstances
without much reflection. In a few cases researches have made instead the
better-controlled quenched average, which averages the logarithm of the number
of stationary points \cite{Ros_2019_Complexity, Kent-Dobias_2023_How,
Ros_2023_Quenched}.

A heuristic line of reasoning for the correctness of the annealed
approximation for the statistics of stationary points is sometimes made when
the annealed approximation is correct for an equilibrium calculation on the same
system. The argument goes like this: since the limit of zero temperature or
noise in an equilibrium calculation concentrates the measure onto the lowest
set of minima, the equilibrium free energy in the limit to zero temperature
should be governed by the same statistics as the count of that lowest set of
minima. This argument is valid, but only for the lowest set of minima, which at
least in glassy problems are rarely relevant to dynamical behavior. What about
the \emph{rest} of the stationary points?

In this paper, we show that the behavior of the ground state, or \emph{any}
equilibrium behavior, does not govern whether stationary points will have a
correct annealed average. In a prototypical family of models of random
functions, we calculate the conditions for when annealed averages should fail
and stationary points should have nontrivial correlations in their mutual
position. We produce examples of models whose equilibrium is guaranteed to
never see correlations between states, but where a population of saddle points
is correlated.

We study the mixed spherical models, which are models of Gaussian-correlated
random functions with isotropic statistics on the $(N-1)$-sphere. Each class of
functions $H:S^{N-1}\to\mathbb R$ is defined by the average covariance between
the function evaluated at two different points $\pmb\sigma_1,\pmb\sigma_2\in
S^{N-1}$, which is a function of the scalar product or overlap between the two
configurations:
\begin{equation} \label{eq:covariance}
  \overline{H(\pmb\sigma_1)H(\pmb\sigma_2)}=\frac1Nf\bigg(\frac{\pmb\sigma_1\cdot\pmb\sigma_2}N\bigg)
\end{equation}
Specifying the covariance function $f$ uniquely specifies the class of functions. The
series coefficients of $f$ need to be nonnnegative in order for $f$ to be a
well-defined covariance. The case where $f$ is a homogeneous polynomial has
been extensively studied, and corresponds to the pure spherical models of glass
physics or the spiked tensor models of statistical inference. Here we will
study cases where $f(q)=\frac12\big(\lambda q^p+(1-\lambda)q^s\big)$ for
$\lambda\in(0,1)$. These are examples of \emph{mixed} spherical models.

These classes of functions have been extensively studied in the physics and
statistics literature, and host a zoo of complex orders and phase transitions
\cite{Crisanti_2004_Spherical, Crisanti_2006_Spherical, Crisanti_2011_Statistical}.

There are two important features which differentiate stationary points $\pmb\sigma^*$ in the
spherical models: their \emph{energy density} $E=\frac1NH(\pmb\sigma^*)$ and
their \emph{stability}
$\mu=\frac1N\operatorname{\mathrm{Tr}}\operatorname{\mathrm{Hess}}H(\pmb\sigma^*)$.
The energy density should be familiar, as the `height' in the landscape. The
stability is so-called because it governs the spectrum of the stationary point.
In each spherical model, the spectrum of each stationary point is a Wigner
semicircle of the same width $\mu_\mathrm m=\sqrt{4f''(1)}$, but shifted by
constant. The stability $\mu$ sets this constant shift. When $\mu<\mu_\mathrm
m$, the spectrum still has support over zero and we have saddles with an
extensive number of downward directions. When $\mu>\mu_\mathrm m$ the spectrum
has support only over positive eigenvalues, and we have stable minima. When
$\mu=\mu_\mathrm m$, the spectrum has a pseudogap, and we have marginal minima.

\begin{figure}
  \centering
  \includegraphics{figs/phases_34.pdf}
  \caption{
    A phase diagram of the boundaries we discuss in this paper for the $3+s$
    model with $f=\frac12\big(\lambda q^3+(1-\lambda)q^s\big)$. The blue region
    shows where there exist some stationary points whose complexity is
    {\oldstylenums1}\textsc{rsb}. The yellow region shows where $f$ is not
    convex and therefore \textsc{rsb} solutions are possible in equilibrium.
    The green region shows where \textsc{rsb} solutions are correct at
    the ground state, adapted from \cite{Auffinger_2022_The}.
  } \label{fig:phases}
\end{figure}

The number $\mathcal N(E,\mu)$ of stationary points with energy density $E$ and
stability $\mu$ is exponential in $N$ for these models. Their complexity $\Sigma(E,\mu)$ is
defined by the average of the logarithm of their number, or
$\Sigma(E,\mu)=\frac1N\overline{\log\mathcal N(E,\mu)}$. More often the annealed
complexity is calculated, where the average is taken before the logarithm:
$\Sigma_\mathrm a(E,\mu)=\frac1N\log\overline{\mathcal N(E,\mu)}$.

When the complexity is calculated using the Kac--Rice formula and a physicists'
tool set, the problem is reduced to the evaluation of an integral by the saddle
point method for large $N$ \cite{Kent-Dobias_2023_How}. An ansatz for
the complexity needs to be made. Here we restrict ourselves to a
{\oldstylenums1}\textsc{rsb} ansatz, which is parameterized by two quantities:
$q_1$ and $x$. They have a geometric interpretation: given a stationary point
fixed with certain properties, $1-x$ corresponds to the proportion of other
stationary points with the same properties that are correlated with it, 
and $q_1$ gives the overlap that this correlated population has with it. In the annealed or replica-symmetric case, $x=1$ and all but a vanishing fraction of stationary points are uncorrelated with each other.
We are guaranteed that
$\Sigma\leq\Sigma_{\oldstylenums1\textsc{rsb}}\leq\Sigma_\mathrm a$, and we
will discuss later in what settings the {\oldstylenums1}\textsc{rsb} complexity
is correct. The complexity is given by extremizing an effective action,
\begin{equation}
  \Sigma_{\oldstylenums1\textsc{rsb}}(E,\mu)=\lim_{n\to0}\int dq_1\,dx\,\mathcal S(q_1,x\mid E,\mu)e^{nN\mathcal S(q_1,x\mid E,\mu)}
  =\mathop{\mathrm{extremum}}_{q_1,x}\mathcal S(q_1,x\mid E,\mu)
\end{equation}
for the action $\mathcal S$ given by
\begin{equation}
  \begin{aligned}
    \mathcal S&(q_1,x\mid E,\mu)
    =\mathcal D(\mu)
    +\mathop{\textrm{extremum}}_{\hat\beta,r_\mathrm d,r_1,d_\mathrm d,d_1}
    \Bigg\{
      \hat\beta E-r_\mathrm d\mu\\
      &+\frac12\bigg[
        \hat\beta^2\big[f(1)-\Delta xf(q_1)\big]
        +(2\hat\beta r_\mathrm d-d_\mathrm d)f'(1)
        -\Delta x(2\hat\beta r_1-d_1)f'(q_1)
        +r_\mathrm d^2f''(1)-\Delta x\,r_1^2f''(q_1) \\
        &+\frac1x\log\Big(
          \big(r_\mathrm d-\Delta x\,r_1\big)^2+d_\mathrm d\big(1-\Delta x\,q_1\big)-\Delta x\,d_1\big(1-\Delta xq_1\big)
          \Big)
          -\frac{\Delta x}x\log\Big(
            (r_\mathrm d-r_1)^2+(d_\mathrm d-d_1)(1-q_1)
          \Big)
      \bigg]
    \Bigg\}
  \end{aligned}
\end{equation}
where $\Delta x=1-x$ and
\begin{equation}
  \mathcal D(\mu)
  =\begin{cases}
    \frac12+\log\left(\frac12\mu_\text m\right)+\frac{\mu^2}{\mu_\text m^2}
     & \mu^2\leq\mu_\text m^2 \\
    \frac12+\log\left(\frac12\mu_\text m\right)+\frac{\mu^2}{\mu_\text m^2}
    -\left|\frac{\mu}{\mu_\text m}\right|\sqrt{\big(\frac\mu{\mu_\text m}\big)^2-1}
    -\log\left(\left|\frac{\mu}{\mu_\text m}\right|-\sqrt{\big(\frac\mu{\mu_\text m}\big)^2-1}\right) & \mu^2>\mu_\text m^2
  \end{cases}
\end{equation}
The extremal problem in $\hat\beta$, $r_\mathrm d$, $r_1$,
$d_\mathrm d$, and $d_1$ has a unique solution and can be found explicitly, but the
resulting formula is much more complicated so we do not include it here. There
can be multiple extrema at which to evaluate $\mathcal S$, in this case the one
for which $\Sigma$ is \emph{smallest} gives the correct solution. There is
always a solution for $x=1$ which is independent of $q_1$, which corresponds to
the replica symmetric case and which is equal to the annealed calculation, so
$\Sigma_\mathrm a(E,\mu)=\mathcal S(E,\mu\mid q_1,1)$. The crux of this paper
will be to determine when this solution is not the global one.

It isn't accurate to say that a solution to the saddle point equations is
`stable' or `unstable.' The problem of solving the complexity in this way is
not a variational problem, so there is nothing to be maximized or minimized,
and in general even global solutions have positive and negative eigenvalues of
the Hessian. However, the eigenvalues of the Hessian can still tell us
something about the emergence of new solutions: when another solution
bifurcates smoothly from an existing one, the Hessian evaluated at that point
will have a zero eigenvalue. Unfortunately this is a difficult procedure to
apply in general, since one must know the parameters in the new solution, and
some parameters, e.g., $q_1$, are unconstrained in the old solution.

There is one place where one can consistently search for a bifurcating solution
to the saddle point equations: along the zero complexity line
$\Sigma(E,\mu)=0$. Going along this line in the replica symmetric solution, the
{\oldstylenums1}\textsc{rsb} complexity transitions at a critical point where
$x=q_1=1$ \cite{Kent-Dobias_2023_How}. Since all the parameters in the
bifurcating solution are known at this point, one can search for it by looking
for a zero eigenvalue in the way described above. In the replica symmetric
solution for points describing saddles, this line is
\begin{equation} \label{eq:extremal.line}
  \mu_0=-\frac1{z_f}\left(2Ef'f''+\sqrt{2f''u_f\bigg(\log\frac{f''}{f'}z_f-E^2(f''-f')\bigg)}\right)
\end{equation}
where we define for brevity (here and elsewhere) the constants
\begin{align}
  u_f=f(f'+f'')-f'^2
  &&
  v_f=f'(f''+f''')-f''^2 \\
  w_f=2f''(f''-f')+f'f'''
  &&
  y_f=f'(f'-f)+f''f
  &&
  z_f=f(f''-f')+f'^2
\end{align}
Note that for $f$ to define a sensible covariance, all of its series
coefficients must be nonnegative. If $f$ has at least two nonzero coefficients
at second order or higher, all of these constants are positive.

Let $M$ be the matrix of double partial derivatives of $\mathcal S$ with
respect to $q_1$ and $x$. We evaluate $M$ at the replica symmetric saddle point
$x=1$ with the additional constraint that $q_1=1$ and along the extremal
complexity line \eqref{eq:extremal.line}. We determine when a zero eigenvalue
appears, indicated the presence of a bifurcating {\oldstylenums1}\textsc{rsb}
solution, by solving $0=\det M$. We find
\begin{equation}
  \det M=-\left(\frac{\partial^2\mathcal S}{\partial q_1\partial x}\bigg|_{\substack{x=1\\q_1=1}}\right)^2\propto(ay^2+bE^2+2cyE-d)^2
\end{equation}
where $y=-\frac12z_f\mu-f'f''E$ is proportional to the square-root term in \eqref{eq:extremal.line} and
the constants $a$, $b$, $c$, and $d$ are defined below. Changing to $y$ is a
convenient choice because the branch of \eqref{eq:extremal.line} is chosen
by the sign of $y$ (the lower-energy branch we are interested in corresponds
with $y>0$) and the relationship between $y$ and $E$ on the extremal line is
$g=2hy^2+eE^2$, where the constants $e$, $g$, and $h$ are given by
\begin{equation}
  \begin{aligned}
    a&=\frac{w_f\big(3y_f^2-4ff'f''(f'-f)\big)-6y_f^2(f''-f')f''}{(u_fz_ff'')^2f'}
    \qquad
    b=\frac{f'w_f}{z_f^2}
      \\
      c&=\frac{w_f}{f''z_f^2}
      \qquad
      d=\frac{w_f}{f'f''}
      \qquad
      e=f''-f'
      \qquad
      g=z_f\log\frac{f''}{f'}
      \qquad
      h=\frac1{f''u_f}
  \end{aligned}
\end{equation}
The solutions for $\det M=0$ correspond to energies that satisfy
\begin{equation}
  E_{\oldstylenums1\textsc{rsb}}^\pm
  =\operatorname{sign}(bg-de)\frac{-cg\pm\sqrt{c^2g^2+(2dh-ag)(bg-de)}}
  {
    \sqrt{2c^2eg+(2bh-ae)(bg-de)\mp2ce\sqrt{c^2g^2+(2dh-ag)(bg-de)}}
  }
\end{equation}
The expression inside the inner square root is proportional to
\begin{equation}
  G_f
  =
  f'\log\frac{f''}{f'}\big[
    3y_f(f''-f')f'''-2(f'-2f)f''w_f
  \big]
  -2(f''-f')u_fw_f
  -2\log^2\frac{f''}{f'}f'^2f''v_f
\end{equation}
If $G_f>0$, then there are two points along the extremal complexity line where
a solution bifurcates, and a new line of {\oldstylenums1}\textsc{rsb} solutions
between them. Therefore, $G_f>0$ is a necessary condition to see
{\oldstylenums1}\textsc{rsb} in the complexity.

\begin{figure}
  \centering
  \includegraphics{figs/complexity_35.pdf}

  \caption{
    Stationary point statistics as a function of energy density $E$ and
    stability $\mu$ for a $3+5$ model with $\lambda=\frac12$. The dashed
    black line shows the line of zero complexity, where stationary points
    vanish, and enclosed inside they are found in exponential number. The red
    region (blown up in the inset) shows where the annealed complexity gives
    the wrong count and a {\oldstylenums1}\textsc{rsb} complexity in necessary.
    The red points show where $\det M=0$. The left point, which is only an
    upper bound on the transition, coincides with it in this case. The gray shaded region highlights the
    minima, which are stationary points with $\mu>\mu_\mathrm m$.
  } \label{fig:complexity_35}
\end{figure}

\begin{figure}
  \centering
  \includegraphics{figs/range_plot_1.pdf}
  \hspace{-3em}
  \includegraphics{figs/range_plot_2.pdf}
  \hspace{-3em}
  \includegraphics{figs/range_plot_3.pdf}
  \hspace{-3em}
  \includegraphics{figs/range_plot_4.pdf} \\
  \vspace{-2em}
  \includegraphics{figs/range_plot_log_1.pdf}
  \hspace{-3em}
  \includegraphics{figs/range_plot_log_2.pdf}
  \hspace{-3em}
  \includegraphics{figs/range_plot_log_3.pdf}
  \hspace{-3em}
  \includegraphics{figs/range_plot_log_4.pdf}

  \caption{
    The range of energies where \textsc{rsb} saddles are found for the $3+s$
    model with varying $s$ and $\lambda$. In the top row the black line shows
    $E_\textrm{min}$, the minimum energy where saddles are found, and in the
    bottom row this energy is subtracted away to emphasize when the
    \textsc{rsb} region crosses into minima. For most $s$, both the top and
    bottom lines are given by $E_{\oldstylenums1\textsc{rsb}}$, but for $s=14$
    there is a portion where the low-energy boundary has $q_1<1$. In that plot,
    the continuation of the $E_{\oldstylenums1\textsc{rsb}}$ line is shown
    dashed.
  }
\end{figure}

\begin{figure}
  \centering
  \includegraphics{figs/order_plot_1.pdf}\\
  \vspace{-1em}
  \includegraphics{figs/order_plot_2.pdf}

  \caption{
    Examples of $3+14$ models where the solution
    $E_{\oldstylenums1\textsc{rsb}}^-$ does and doesn't define the lower limit
    of energies where \textsc{rsb} saddles are found. In both plots the red dot
    shows $E_{\oldstylenums1\textsc{rsb}}^-$, while the solid red lines shows
    the transition boundary with the \textsc{rs} complexity. The dashed black
    line shows the \textsc{rs} zero complexity line, while the solid black line
    shows the {\oldstylenums1}\textsc{rsb} zero complexity line. \textbf{Top:}
    $\lambda=0.67$. Here the end of the transition line that begins at
    $E_{\oldstylenums1\textsc{rsb}}^+$ does not match
    $E_{\oldstylenums1\textsc{rsb}}^-$ but terminates at higher energies.
    $E_{\oldstylenums1\textsc{rsb}}^-$ still corresponds with the lower bound.
    \textbf{Bottom:} $\lambda=0.69$. Here the end of the transition line that
    begins at $E_{\oldstylenums1\textsc{rsb}}^+$ terminates at lower energies
    than $E_{\oldstylenums1\textsc{rsb}}^-$, and therefore its terminus defines
    the lower bound.
  }
\end{figure}

\begin{equation}
  \mu
  =-\frac{(f_1'+f_0'')u_f}{(2f_1-f_1')f_1'f_0''^{1/2}}
  -\frac{f_1''-f_1'}{f_1'-2f_1}E
\end{equation}

There are implications for the emergence of \textsc{rsb} in equilibrium. Consider a specific $H$ with
\begin{equation}
  H(\pmb\sigma)
  =\frac{\sqrt\lambda}{p!}\sum_{i_1\cdots i_p}J^{(p)}_{i_1\cdots i_p}\sigma_{i_1}\cdots\sigma_{i_2}
  +\frac{\sqrt{1-\lambda}}{s!}\sum_{i_1\cdots i_s}J^{(s)}_{i_1\cdots i_s}\sigma_{i_1}\cdots\sigma_{i_s}
\end{equation}
where the interaction tensors $J$ are drawn from zero-mean normal distributions
with $\overline{(J^{(p)})^2}=p!/2N^{p-1}$ and likewise for $J^{(s)}$. It is
straightforward to confirm that $H$ defined this way has the covariance
property \eqref{eq:covariance} with $f(q)=\frac12\big(\lambda
q^p+(1-\lambda)q^s\big)$. With the $J$s drawn in this way and fixed for $p=3$
and $s=16$, we can vary $\lambda$, and according to Fig.~\ref{fig:phases} we
should see a transition in the type of order at the ground state. What causes
the change? Our analysis indicates that stationary points with the required
order \emph{already exist in the landscape} as unstable saddles for small
$\lambda$, then eventually stabilize into metastable minima and finally become
the lowest lying states. This is different from the picture of existing
uncorrelated low-lying states splitting apart into correlated clusters.

\printbibliography

\end{document}