response.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313


For reviewer 2

> On page 9, the author points out that there are solutions with complexity 0
> that do not show an extensive barrier "in any situation". First, this "in any
> situation" is quite unclear. Does the author mean above and below the
> threshold energy? Does this solution exist even at high energy?
> Can the author comment on what this solution could imply?

The sense of "any situation" means "for a reference point of any energy and
stability." This includes energies above and below the threshold energy, at
stabilities that imply saddles, minima, or marginal minima, and even for
combinations of energy and stability where the complexity of stationary points
is negative.

The two-point complexity is computed under the condition that the reference
point exists. Given that the reference point exists, there is at least one
point that can be found at zero overlap with the reference: itself. This
reasoning alone rationalizes why we should find a solution with Σ₁₂ = 0, q = 0,
and E₀ = E₁, μ₀ = μ₁ for any E₀ and μ₀.

The interpretation is more subtle. Two different stationary points cannot lie
at the same point, but the complexity calculation only resolves numbers of
points that are exponential in N and differences in overlap that are linear in
N. Therefore, the complexity calculation is compatible with many stationary
points being contained in the subextensive region of dimension Δq = O(1/N)
around any reference point. We can reason as to where these extremely near
neighbors are likely or unlikely to exist in specific conditions, but the
complexity calculation cannot rule them out.

This is point is not crucial to anything in the paper, except to make more
precise the statement that non-threshold marginal minima are separated by a gap
in their overlap. Because marginal minima have very flat directions, they are
good candidates for possessing these extremely near neighbors, and this might
lead one to say they are not isolated. However, if such extremely near
neighbors exist, they are irrelevant to dynamics: the entire group is isolated,
since the complexity of similar stationary points at a small but extensive
overlap further is negative.

Because the point is not important to the conclusions of the paper, the
paragraph has been revised for clarity and moved to a footnote.

> At the technical level, I am confused by one of the constraints imposed in
> eq.16, when \sigma_1 couples with all replicated s_a. I was expecting a sum
> of \sigma_b.s_a over a and b. This may represent a rotation applied to all
> replicas along a reference direction, which is probably what the author did,
> but there is no comment about that. In general, it would have helped to
> specify the constraints enforced instead of writing simply "Lagrange
> multipliers" before eq.16.

The fact noticed by the referee that only σ₁ appears in the scalar product with
sₐ in equation (16) of the original manuscript was not introduced in that
equation, but instead was introduced in equation (10) of the original
manuscript. Right after that equation, the special status of σ₁ was clarified.
This arises because of the structure of equation (9): in that equation, the
logarithmic expression being averaged depends only on σ, which corresponds with
σ₁ in the following equation. σ₂ through σₘ correspond to σ', which is
replicated (m - 1) times to bring the partition function into the numerator.
Therefore, there is a clear reason behind the asymmetry among the replicas
associated with the reference spin, and it was not due to an ad-hoc
transformation as suggested by the referee.

As part of the rewriting of the manuscript for clarity, this subtlety has been
emphasized around what are now equations (9) and (10).

> The author analyses the problem using the Franz-Parisi potential, however,
> this analysis does not seem to matter in the paper. We can read a comment at
> the end of Sec.3.1 but without actual implications. It should be either
> removed or expanded, at the moment it seems just without purpose.

The analysis of the Franz-Parisi potential has been moved to Appendix C, with a
more explanatory discussion of our interest in it included in the manuscript.
In short, the referee is right to point out that it has no implications for the
main topic of the paper. It is included because some specialists will be
interested in the comparison between it and the two-point complexity. This
reasoning is now explained at the beginning of Appendix C.

> "We see arrangements of barriers relative to each other, perhaps...". Why
> "perhaps"? Second, where is this analysis carried out? In the results
> section, the author analyses stable minima and marginal states, I don't know
> where to look. Adding a reference would have helped.

The sentence in question has now been rephrased, but "perhaps" was due to the
fact that not very much is learned about the mutual arrangement of saddles from
this work. In order to make clear what conclusions can be drawn about saddles
from our calculation, we have added a new subsection to the Results section,
3.2: Grouping of saddle points. This subsection contains two paragraphs
detailing what one might want to know about the geometry of saddle points, and
what we actually learn from the two-point complexity.

> After eq.3, the author comments on the replica ansatz, but this is out of
> place. We are still introducing the model. It would be better to have it at
> the end of the section (where indeed the author comes back to the same
> concept) or remove it entirely.

The referee is right to point out this oversight, and the note about the
specific influence of the covariance function f on the form of RSB has been
moved into the details for the calculation of the complexity, in subsection
A.4: Replica ansatz and saddle point. Where it was in section 2 we now say

"The choice of *f* has significant effect on the form of order in the model, and
this likewise influences the geometry of stationary points."

> fig.1, add a caption under each figure saying what they are (oriented
> saddles, oriented minima, etc), it is much easier to read.

The suggestion of the referee was good and was implemented in the new manuscript.

> fig.2, elaborate a bit more in the main text. This is introduced at the of
> the section without any comment.

A paragraph discussing Fig. 2 has been added to the main text, and the end of
Section 2.

> fig.3, "the dot-dashed lines on both plots depict the trajectory of the solid
> line on the other plot", which one?

The answer is both. This confusing sentence has been clarified in the new
manuscript. It now reads:

"The dot-dashed line on the left plot depicts the trajectory of the solid line
on the right plot, and the dot-dashed line on the right plot depicts the
trajectory of the solid line on the left plot."

> fig.3, "In this case, the points lying nearest to the reference minimum are
> saddle with mu\<mu, but with energies smaller than the threshold energy", so?
> What is the implication? This misses a conclusion.

These low-lying saddles represent large deviations from the typical complexity.
The point has been clarified by appending "which makes them an atypical
population of saddles" to the sentence.

> Sec.3.1, the author comments on the similarity with the pure model, without
> explaining what is similar. What should we expect on the p-spin? At least the
> relevant aspects. It would also be useful to plot a version of Fig.3 for the
> p-spin. It would make the discussion easier to follow.

In the reversed manuscript, the points of comparison with the pure models are
mode more explicit, as the referee suggests. We do not think it is necessary to
include a figure for the pure models, instead clarifying the most important
departure in the text:

"The largest difference is the decoupling of nearby stable points from nearby
low-energy points: in the pure *p*-spin model, the left and right panels of
Fig. 3 would be identical up to a constant factor -*p*."

For those interested in more detailed comparisons, the relevant figure for the
pure models is found in the paper twice cited in that subsection.

> "the nearest neighbour points are always oriented saddles", where do I see this?

We have added a sentence to clarify this point:

"This is a result of the persistent presence of a negative isolated eigenvalue
in the spectrum of the nearest neighbors, e.g., as in the shaded regions of
Fig. 3."

> the sentence "like in the pure models, the emergence [...]" is extremely hard
> to parse and the paragraph ends without a conclusion. What are the
> consequences?

This sentence has been expanded to make it more clear, and the statement now reads

"Like in the pure models, the minimum energy and maximum stability of nearby
points are not monotonic: there is a range of overlap where the minimum energy
of neighbors decreases with proximity. The emergence of oriented index-one
saddles along the line of lowest-energy states at a given overlap occurs at the
local minimum of this line, another similarity with the pure models [13]. It is
not clear why this should be true or what implications it has for behavior."

We also now emphasize that the implications are not known. However, the
coincidence itself it interesting, at the very least for the ability to predict
where an isolated eigenvalue should destabilize nearby minima without making the
computation for the eigenvalue.

> at page 9, the author talk about \Sigma_12 that however has not been defined yet.

The referee is correct to point out this oversight, which has now been amended
by a qualitative definition of Σ₁₂ at the beginning of the results section.

> this section starts without explaining what is the strategy to solve the
> problem. Explaining how the following subsection will contribute to the
> solution without entering into the details of the computation would be of
> great help.

The explanation of the calculation for the complexity has been reorganized and
expanded in the new manuscript. In part of this expansion, we added more
explanation of this kind. Most of this is now found in Appendix A.

> "This replica symmetry will be important later" how? Either we have an
> explanation following or it should be removed.

The comment has been removed in the new manuscript.

> at the end of a step it would be good to wrap everything up. For instance,
> sec.4.2 ends with "we do not include these details, which are standard" at
> least give a reference. Second, add the final result.

In the revised manuscript, more has been done to wrap up each section. For instance, what was section 4.2 and is is now section A.2 now ends

"The result of this calculation is found in the effective action (44), where it
contributes all terms besides the functions D contributed by the Hessian terms
in the previous section and the logarithms contributed by the
Hubbard–Stratonovich transformation of the next section."

> "there is a desert where none are found" -> solutions are exponentially rare
> (or something else)

The statement has been rewritten, and now says

"Therefore, marginal minima whose energy *E*₀ is greater than the threshold have
neighbors at arbitrarily close distance with a quadratic pseudogap, while those
whose energy is less than the threshold have an overlap gap."

> I would suggest a rewriting, especially the last sessions (4-6). I understand
> the intention of removing simple details, but they should be replaced by
> comments. The impression (which can be wrong but gives the idea) is of some
> working notes where simple steps have been removed, resulting in
> hard-to-follow computations. Finally, I would also recommend moving these
> sections to an appendix (after acknowledgement and funding).

As suggested by both referees, much of the paper was rewritten and expanded,
especially to provide more details in the calculation of the complexity. It was
also rearranged to put most of those details in appendices.

For reviewer 1

> i) On page 7, when referring to the set of marginal states that attract
> dynamics "as evidenced by power-law relaxations", it would be convenient to
> provide references for this statement.

The evidence of power-law relaxation to marginal minima is contained in G.
Folena and F. Zamponi, On weak ergodicity breaking in mean-field spin glasses,
SciPost Physics 15(3), 109 (2023). In the original manuscript this work was
cited at the end of the sentence, but the sentence has now be rephrased and the
specific point about power-law relaxation has been removed to improve clarity.

> ii) On the same page, the author refers to a quadratic pseudo-gap in the
> complexity function associated with marginal states. It would be helpful to
> have some more indication of how this was derived or, again, to provide
> appropriate references.

The form of the pseudo-gap in overlap for marginal states above the threshold
energy is demonstrated in the subsection on the expansion of the complexity in
the near neighborhood (equation 40 in the original manuscript).

> iii) Section 4 “Calculation of the two-point complexity”. The author states
> that conditioning the Hessian matrix of the stationary points to have a given
> energy and given stability properties influences the statistics of points
> only at the sub-leading order. It would be valuable to clarify the conditions
> under which this occurs. I was thus wondering whether the author can
> straightforwardly generalize such a computation and give some insights in the
> case of a sparse (no longer fully connected) model.

In the manuscript, we have added a small clarification as to the reason for this:

"This is because the conditioning amounts to a rank-one perturbation to the
Hessian matrix, which does not affect the bulk of its spectrum."

From this, one can reason that the same assumptions will hold whenever rank-one
perturbations do not affect the bulk spectrum. While we are not experts in the
theory of sparse matrices, it seems likely this condition breaks down when a
matrix is sufficiently sparse.

> iv) Eq. (34) is quite complicated and difficult to grasp by eye. I thus
> wonder whether the numerical protocol is robust enough to be sure that by
> initializing differently, not exactly at q=0, the same solution is always
> found. How sensitive is the protocol to the choice of initial conditions?

It is true that (34) (now (11)) is quite complicated, but the numeric methods
we use find the same solutions quite robustly. First, we make use of arbitrary
precision arithmetic in Mathematica to ensure that the roots of the saddle
point equations derived from this expression are indeed good roots, in this
case with a 30-digit working precision. Second, the initialization near a good
solution known from analytics, namely the solution at q = 0, is crucial because
if initialized from random conditions a valid solution is never found. If we
initialize the root-finding algorithm using the known solution at q = 0 and
then attempt to solve the equations at some small q > 0, a consistent solution
is found so long as q is sufficiently small. This is also true if the initial
condition is randomly perturbed by a small amount. If the first q > 0 is too
large or the random perturbation is too large, only nonphysical solutions are
found. Luckily, we expect that the complexity of stationary points at different
proximities varies smoothly, so that this procedure is justified.

> v) In Section 5, the analysis of an isolated eigenvalue, which can be
> attributed to a low-rank perturbation in the Hessian matrix, is discussed.
> The technique results from a generalization of a paper recently published by
> H. Ikeda, restricted to a quadratic model though. It would be worthwhile to
> discuss how many of these predictions can be extended to models defined by a
> double-well potential or to optimization problems relying on non-quadratic
> functions (such as ReLu, sigmoid).

The technique is quite generic, and the ability to apply it to other models
rests mostly in the tractability of the saddle point calculation. I guess the
reviewer is referencing the KHGPS model, or simple neural networks. The
principle challenge in these cases is the Kac–Rice calculation itself, which
has not be extended to systems without Gaussian disorder. If this were
resolved, using this technique to analyse the properties of an isolated
eigenvalue would be a painful corollary. (ReLu is problematic with respect to
these landscape methods, however, because it does not have a well-defined
Hessian everywhere.)

## Requested changes

> I found the paper interesting but quite technical in some points. Moving the
> saddle-point computations and part of the analysis (see for instance on pages
> 11-13 and 18-20) to the supplement would make it easier to capture the main
> results, especially for general readers without extensive expertise in the
> replica trick and these models.

As suggested by both referees, the paper was rearranged to put most of those
details in appendices.