summaryrefslogtreecommitdiff
path: root/response.md
diff options
context:
space:
mode:
Diffstat (limited to 'response.md')
-rw-r--r--response.md151
1 files changed, 139 insertions, 12 deletions
diff --git a/response.md b/response.md
index f6fcfbf..39042a0 100644
--- a/response.md
+++ b/response.md
@@ -23,10 +23,10 @@ The interpretation is more subtle. Two different stationary points cannot lie
at the same point, but the complexity calculation only resolves numbers of
points that are exponential in N and differences in overlap that are linear in
N. Therefore, the complexity calculation is compatible with many stationary
-points being contained in the subextensive region of dimension Δq = O(1) around
-any reference point. We can reason as to where these extremely near neighbors
-are likely or unlikely to exist in specific conditions, but the complexity
-calculation cannot rule them out.
+points being contained in the subextensive region of dimension Δq = O(1/N)
+around any reference point. We can reason as to where these extremely near
+neighbors are likely or unlikely to exist in specific conditions, but the
+complexity calculation cannot rule them out.
This is point is not crucial to anything in the paper, except to make more
precise the statement that non-threshold marginal minima are separated by a gap
@@ -37,6 +37,9 @@ neighbors exist, they are irrelevant to dynamics: the entire group is isolated,
since the complexity of similar stationary points at a small but extensive
overlap further is negative.
+Because the point is not important to the conclusions of the paper, the
+paragraph has been revised for clarity and moved to a footnote.
+
> At the technical level, I am confused by one of the constraints imposed in
> eq.16, when \sigma_1 couples with all replicated s_a. I was expecting a sum
> of \sigma_b.s_a over a and b. This may represent a rotation applied to all
@@ -57,24 +60,47 @@ Therefore, there is a clear reason behind the asymmetry among the replicas
associated with the reference spin, and it was not due to an ad-hoc
transformation as suggested by the referee.
+As part of the rewriting of the manuscript for clarity, this subtlety has been
+emphasized around what are now equations (9) and (10).
+
> The author analyses the problem using the Franz-Parisi potential, however,
> this analysis does not seem to matter in the paper. We can read a comment at
> the end of Sec.3.1 but without actual implications. It should be either
> removed or expanded, at the moment it seems just without purpose.
-The analysis of the Franz-Parisi potential has been moved to an appendix, with
-a more explanatory discussion of our interest in it included in the manuscript.
+The analysis of the Franz-Parisi potential has been moved to Appendix C, with a
+more explanatory discussion of our interest in it included in the manuscript.
+In short, the referee is right to point out that it has no implications for the
+main topic of the paper. It is included because some specialists will be
+interested in the comparison between it and the two-point complexity. This
+reasoning is now explained at the beginning of Appendix C.
> "We see arrangements of barriers relative to each other, perhaps...". Why
> "perhaps"? Second, where is this analysis carried out? In the results
> section, the author analyses stable minima and marginal states, I don't know
> where to look. Adding a reference would have helped.
+The sentence in question has now been rephrased, but "perhaps" was due to the
+fact that not very much is learned about the mutual arrangement of saddles from
+this work. In order to make clear what conclusions can be drawn about saddles
+from our calculation, we have added a new subsection to the Results section,
+3.2: Grouping of saddle points. This subsection contains two paragraphs
+detailing what one might want to know about the geometry of saddle points, and
+what we actually learn from the two-point complexity.
+
> After eq.3, the author comments on the replica ansatz, but this is out of
> place. We are still introducing the model. It would be better to have it at
> the end of the section (where indeed the author comes back to the same
> concept) or remove it entirely.
+The referee is right to point out this oversight, and the note about the
+specific influence of the covariance function f on the form of RSB has been
+moved into the details for the calculation of the complexity, in subsection
+A.4: Replica ansatz and saddle point. Where it was in section 2 we now say
+
+"The choice of *f* has significant effect on the form of order in the model, and
+this likewise influences the geometry of stationary points."
+
> fig.1, add a caption under each figure saying what they are (oriented
> saddles, oriented minima, etc), it is much easier to read.
@@ -83,43 +109,109 @@ The suggestion of the referee was good and was implemented in the new manuscript
> fig.2, elaborate a bit more in the main text. This is introduced at the of
> the section without any comment.
+A paragraph discussing Fig. 2 has been added to the main text, and the end of
+Section 2.
+
> fig.3, "the dot-dashed lines on both plots depict the trajectory of the solid
> line on the other plot", which one?
+The answer is both. This confusing sentence has been clarified in the new
+manuscript. It now reads:
+
+"The dot-dashed line on the left plot depicts the trajectory of the solid line
+on the right plot, and the dot-dashed line on the right plot depicts the
+trajectory of the solid line on the left plot."
+
> fig.3, "In this case, the points lying nearest to the reference minimum are
> saddle with mu\<mu, but with energies smaller than the threshold energy", so?
> What is the implication? This misses a conclusion.
-These low-lying saddles represent large-deviations from the typical complexity
+These low-lying saddles represent large deviations from the typical complexity.
+The point has been clarified by appending "which makes them an atypical
+population of saddles" to the sentence.
> Sec.3.1, the author comments on the similarity with the pure model, without
> explaining what is similar. What should we expect on the p-spin? At least the
> relevant aspects. It would also be useful to plot a version of Fig.3 for the
> p-spin. It would make the discussion easier to follow.
+In the reversed manuscript, the points of comparison with the pure models are
+mode more explicit, as the referee suggests. We do not think it is necessary to
+include a figure for the pure models, instead clarifying the most important
+departure in the text:
+
+"The largest difference is the decoupling of nearby stable points from nearby
+low-energy points: in the pure *p*-spin model, the left and right panels of
+Fig. 3 would be identical up to a constant factor -*p*."
+
+For those interested in more detailed comparisons, the relevant figure for the
+pure models is found in the paper twice cited in that subsection.
+
> "the nearest neighbour points are always oriented saddles", where do I see this?
+We have added a sentence to clarify this point:
+
+"This is a result of the persistent presence of a negative isolated eigenvalue
+in the spectrum of the nearest neighbors, e.g., as in the shaded regions of
+Fig. 3."
+
> the sentence "like in the pure models, the emergence [...]" is extremely hard
> to parse and the paragraph ends without a conclusion. What are the
> consequences?
+This sentence has been expanded to make it more clear, and the statement now reads
+
+"Like in the pure models, the minimum energy and maximum stability of nearby
+points are not monotonic: there is a range of overlap where the minimum energy
+of neighbors decreases with proximity. The emergence of oriented index-one
+saddles along the line of lowest-energy states at a given overlap occurs at the
+local minimum of this line, another similarity with the pure models [13]. It is
+not clear why this should be true or what implications it has for behavior."
+
+We also now emphasize that the implications are not known. However, the
+coincidence itself it interesting, at the very least for the ability to predict
+where an isolated eigenvalue should destabilize nearby minima without making the
+computation for the eigenvalue.
+
> at page 9, the author talk about \Sigma_12 that however has not been defined yet.
+The referee is correct to point out this oversight, which has now been amended
+by a qualitative definition of Σ₁₂ at the beginning of the results section.
+
> this section starts without explaining what is the strategy to solve the
> problem. Explaining how the following subsection will contribute to the
> solution without entering into the details of the computation would be of
> great help.
+The explanation of the calculation for the complexity has been reorganized and
+expanded in the new manuscript. In part of this expansion, we added more
+explanation of this kind. Most of this is now found in Appendix A.
+
> "This replica symmetry will be important later" how? Either we have an
> explanation following or it should be removed.
+The comment has been removed in the new manuscript.
+
> at the end of a step it would be good to wrap everything up. For instance,
> sec.4.2 ends with "we do not include these details, which are standard" at
> least give a reference. Second, add the final result.
+In the revised manuscript, more has been done to wrap up each section. For instance, what was section 4.2 and is is now section A.2 now ends
+
+"The result of this calculation is found in the effective action (44), where it
+contributes all terms besides the functions D contributed by the Hessian terms
+in the previous section and the logarithms contributed by the
+Hubbard–Stratonovich transformation of the next section."
+
> "there is a desert where none are found" -> solutions are exponentially rare
> (or something else)
+The statement has been rewritten, and now says
+
+"Therefore, marginal minima whose energy *E*₀ is greater than the threshold have
+neighbors at arbitrarily close distance with a quadratic pseudogap, while those
+whose energy is less than the threshold have an overlap gap."
+
> I would suggest a rewriting, especially the last sessions (4-6). I understand
> the intention of removing simple details, but they should be replaced by
> comments. The impression (which can be wrong but gives the idea) is of some
@@ -127,6 +219,10 @@ These low-lying saddles represent large-deviations from the typical complexity
> hard-to-follow computations. Finally, I would also recommend moving these
> sections to an appendix (after acknowledgement and funding).
+As suggested by both referees, much of the paper was rewritten and expanded,
+especially to provide more details in the calculation of the complexity. It was
+also rearranged to put most of those details in appendices.
+
For reviewer 1
> i) On page 7, when referring to the set of marginal states that attract
@@ -156,14 +252,35 @@ the near neighborhood (equation 40 in the original manuscript).
> straightforwardly generalize such a computation and give some insights in the
> case of a sparse (no longer fully connected) model.
-CITE VALENTINA??
+In the manuscript, we have added a small clarification as to the reason for this:
+
+"This is because the conditioning amounts to a rank-one perturbation to the
+Hessian matrix, which does not affect the bulk of its spectrum."
+
+From this, one can reason that the same assumptions will hold whenever rank-one
+perturbations do not affect the bulk spectrum. While we are not experts in the
+theory of sparse matrices, it seems likely this condition breaks down when a
+matrix is sufficiently sparse.
> iv) Eq. (34) is quite complicated and difficult to grasp by eye. I thus
> wonder whether the numerical protocol is robust enough to be sure that by
> initializing differently, not exactly at q=0, the same solution is always
> found. How sensitive is the protocol to the choice of initial conditions?
-DO A LITTLE EXPERIMENT
+It is true that (34) (now (11)) is quite complicated, but the numeric methods
+we use find the same solutions quite robustly. First, we make use of arbitrary
+precision arithmetic in Mathematica to ensure that the roots of the saddle
+point equations derived from this expression are indeed good roots, in this
+case with a 30-digit working precision. Second, the initialization near a good
+solution known from analytics, namely the solution at q = 0, is crucial because
+if initialized from random conditions a valid solution is never found. If we
+initialize the root-finding algorithm using the known solution at q = 0 and
+then attempt to solve the equations at some small q > 0, a consistent solution
+is found so long as q is sufficiently small. This is also true if the initial
+condition is randomly perturbed by a small amount. If the first q > 0 is too
+large or the random perturbation is too large, only nonphysical solutions are
+found. Luckily, we expect that the complexity of stationary points at different
+proximities varies smoothly, so that this procedure is justified.
> v) In Section 5, the analysis of an isolated eigenvalue, which can be
> attributed to a low-rank perturbation in the Hessian matrix, is discussed.
@@ -173,9 +290,17 @@ DO A LITTLE EXPERIMENT
> double-well potential or to optimization problems relying on non-quadratic
> functions (such as ReLu, sigmoid).
-ALWAYS QUADRATIC!
+The technique is quite generic, and the ability to apply it to other models
+rests mostly in the tractability of the saddle point calculation. I guess the
+reviewer is referencing the KHGPS model, or simple neural networks. The
+principle challenge in these cases is the Kac–Rice calculation itself, which
+has not be extended to systems without Gaussian disorder. If this were
+resolved, using this technique to analyse the properties of an isolated
+eigenvalue would be a painful corollary. (ReLu is problematic with respect to
+these landscape methods, however, because it does not have a well-defined
+Hessian everywhere.)
-Requested changes
+## Requested changes
> I found the paper interesting but quite technical in some points. Moving the
> saddle-point computations and part of the analysis (see for instance on pages
@@ -183,4 +308,6 @@ Requested changes
> results, especially for general readers without extensive expertise in the
> replica trick and these models.
-OK
+As suggested by both referees, the paper was rearranged to put most of those
+details in appendices.
+