1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
|
I thank the referees for their useful feedback, which led to positive changes
to the manuscript. All changes made since the first submission can be found
highlighted in an attached PDF generated by latexdiff.
In addition to the changes made in response to referee comments detailed below,
there were three other changes made to the resubmitted manuscript:
- References to a "companion paper" were changed to a "related work", since
the two papers are not being considered as companions.
- Soon after submission I identified a mistake in Appendix A regarding the
matrix form of a super linear operator. This mistake did not affect any of
the formulae or results of the rest of the manuscript, but has nevertheless
been amended.
- I found and repaired a spelling mistake in the paragraph after what is now
equation 67
Report of the First Referee:
> 1) Terms such as "marginal minima" and "pseudogap" are used without clear
> definitions. These terms refers to different concepts depending on various
> fields, which can yield meaningless confusions. Provide clear definitions for
> these technical terms when they appear at the first time.
The text of the second paragraph of the introduction has been expanded to more
precisely define the terms "marginal minima" and "pseudogap". Its final
sentences now read:
The level set associated with this threshold energy contains mostly
\emph{marginal minima}, or minima whose Hessian matrix have a continuous
spectral density over all sufficiently small positive eigenvalues. In most
circumstances the spectrum is \emph{pseudogapped}, which means that the
spectral density smoothly approaches zero as zero eigenvalue is approached
from above.
If this level of definition is not sufficiently clear to the reviewer, or if
there are further terms I have neglected, I welcome further comment on the
matter.
> 2) In eqs (23)-(25), I could not figure out why both notations of L(x,w) and
> H(x, w) are used. If the two notations refer to the identical quantity, it
> should be unified. Otherwise, their difference should be explained.
This confusion stems from a notational ambiguity. The domain of H and the
domain of ∇H are not the same, and writing ∇H(x, ω) is not meant to imply the
existence of a function H(x, ω). I have expanded the text around equations (24)
and (25) in an attempt to clarify this point.
> 3) At the first reading, I am confused with eq. (38). The author writes
> "This is because the trace of $\partial \partial H$ is typically an order of
> $N$ smaller than trace of $\partial \partial g_i". This would be true for
> Hamiltonian of eq. (45). However, does it hold for sums of squared random
> functions such as eq. (71)? Let us consider a trivial case
> $V_i(x) = r_i \cdot x/\sqrt{N}$, where $r_i$ is a random vector from
> $N(0, I_{N\times N})$. This makes eq. (71) a quadratic form of a negative
> definite matrix, for which its trace of Hessian scale as $O(N)$. This may be
> an exceptional case. However, statements such as the above before showing
> concrete target systems can confuse readers. I would like to ask the author
> to amend the writing.
I thank the referee for catching this mistake. Fortunately, its effect on the
manuscript was minor, because correctly accounting for cases like the referee
describes results in only a constant correction to μ. Since only the relative
value of μ is important for identifying marginal minima, the marginal
complexity calculated while neglecting it is still correct, as in the model
examined in the "related work" arXiv:2407.02092 which has such a linear term.
I have changed the text of the manuscript and several equations to correct this
mistake. This can be seen in the vicinity of equations (using the new
manuscript's numbering) 38/39, after equation 46, after equation 59, and after
equation 75. In Sections IV.C and D this leads to changes in display math that
were not captured by the latexdiff, in equations 76, 85, 86, D2, D6, D8, and
D11, all consisting of replacing μ with μ + f'(0).
Report of the Second Referee:
> (1) The first two examples (spherical spin glasses and multi-spherical spin
> glasses) exhibit the property that the complexity of marginal states splits into
> two contributions: the “unconstrained” complexity and a large deviation function
> associated with the smallest eigenvalue of the Hessian. In the text it is
> claimed that this behavior follows from the Gaussian nature of the Hessian. Is
> this statement general? If one constructs models whose Hessians are not
> invariant—for example, with an entry-dependent variance pattern—can one still
> expect this statement to hold?
This question is an astute one, and I cannot speak to whether Gaussianity alone
is a sufficient condition for the separation of the action. Positing properties
of the Hessian is not enough for reasoning about this, since the key question
is how correlations between the Hessian, gradient, and energy compare in
magnitude with their self correlations. So, one would need to construct an
ensemble of random functions whose Hessian has such a property to begin
addressing this.
Rather than venture into this probably rich research line, I have simply
clarified in the text after what is now equation 53 that this is characteristic
of isotropic and Gaussian random functions.
> (2) It appears that Eq. (66) and its zero-temperature limit, when evaluated at
> the saddle point, provide a parametrization of the large deviation functions for
> the smallest Hessian eigenvalue, analogous to Eq. (52) for the GOE case. Is
> there any way to express this large deviation function more transparently, or in
> a form that makes the limit ϵ→0 easier to read?
Unfortunately I have not found a way to nicely express such a thing. The
zero-temperature limit of what is now equation 67 is a much more unwieldy
expression than equation 67 itself, and is not appropriate for inclusion in a
manuscript let alone a source of intuitive insight. Though the referee's
suggestion of a reduction in the ε→0 limit does exist, it involves the
nontrivial coordination of limiting saddle-point values in the variables making
up the expression.
> (3) Maybe the author can comment on the relation between his approach and the
> methods developed in the past to track marginal minima (mostly in the sense of
> an isolated eigenvalue of the Hessian rather than pseudogapped), such as:
>
> Marginal states in mean-field glasses
> Markus Müller, Luca Leuzzi, and Andrea Crisanti
> PHYSICAL REVIEW B 74, 134431 2006
I have added a final paragraph to the conclusion discussion the relationship
between these two papers. It reads
The title of our paper and that of \citeauthor{Muller_2006_Marginal} suggest
they address the same topic, but this is not the case
\cite{Muller_2006_Marginal}. That work differs in three important and
fundamental ways. First, it describes minima of the TAP free energy and
involves peculiarities specific to the TAP. Second, it describes dominant
minima which happen to be marginal, not a condition for finding subdominant
marginal minima. Finally, it focuses on minima with a single soft direction
(which are the typical minima of the low temperature Sherrington--Kirkpatrick
TAP free energy), while we aim to avoid such minima in favor of ones that
have a pseudogap (which we argue are relevant to out-of-equilibrium
dynamics). The fact that the typical minima studied by
\citeauthor{Muller_2006_Marginal} are not marginal in this latter sense may
provide an intuitive explanation for the seeming discrepancy between the
proof that the low-energy Sherrington--Kirkpatrick model cannot be sampled
\cite{ElAlaoui_2022_Sampling} and the proof that a message passing algorithm
can find near-ground states \cite{Montanari_2021_Optimization}: the algorithm
finds the atypical low-lying states that are marginal in the sense considered
here but cannot find the typical ones considered by
\citeauthor{Muller_2006_Marginal}.
> (4) When introducing the method around Eq. (1), it should be stated that this
> works for symmetric matrices A.
The text new reflects this.
> Typos:
>
> Eq (A8): integral should be over d2
> Page 11, last line second column: “minima can dynamics” —> a verb is missing
> here
These small mistakes have been fixed in the new manuscript.
|