diff options
author | Jaron Kent-Dobias <jaron@kent-dobias.com> | 2023-12-04 15:45:43 +0100 |
---|---|---|
committer | Jaron Kent-Dobias <jaron@kent-dobias.com> | 2023-12-04 15:45:43 +0100 |
commit | ae565b36c22ae41973dabca96be1311e80062119 (patch) | |
tree | 1af575ab33c0b24901fd3f46f940011cd5fc0a9c /response.md | |
parent | 10c5455234d5bbfb36ef9725817ea1c05c2d7fb1 (diff) | |
download | SciPostPhys_16_001-ae565b36c22ae41973dabca96be1311e80062119.tar.gz SciPostPhys_16_001-ae565b36c22ae41973dabca96be1311e80062119.tar.bz2 SciPostPhys_16_001-ae565b36c22ae41973dabca96be1311e80062119.zip |
Big round of edits and reviewer responses.
Diffstat (limited to 'response.md')
-rw-r--r-- | response.md | 151 |
1 files changed, 139 insertions, 12 deletions
diff --git a/response.md b/response.md index f6fcfbf..39042a0 100644 --- a/response.md +++ b/response.md @@ -23,10 +23,10 @@ The interpretation is more subtle. Two different stationary points cannot lie at the same point, but the complexity calculation only resolves numbers of points that are exponential in N and differences in overlap that are linear in N. Therefore, the complexity calculation is compatible with many stationary -points being contained in the subextensive region of dimension Δq = O(1) around -any reference point. We can reason as to where these extremely near neighbors -are likely or unlikely to exist in specific conditions, but the complexity -calculation cannot rule them out. +points being contained in the subextensive region of dimension Δq = O(1/N) +around any reference point. We can reason as to where these extremely near +neighbors are likely or unlikely to exist in specific conditions, but the +complexity calculation cannot rule them out. This is point is not crucial to anything in the paper, except to make more precise the statement that non-threshold marginal minima are separated by a gap @@ -37,6 +37,9 @@ neighbors exist, they are irrelevant to dynamics: the entire group is isolated, since the complexity of similar stationary points at a small but extensive overlap further is negative. +Because the point is not important to the conclusions of the paper, the +paragraph has been revised for clarity and moved to a footnote. + > At the technical level, I am confused by one of the constraints imposed in > eq.16, when \sigma_1 couples with all replicated s_a. I was expecting a sum > of \sigma_b.s_a over a and b. This may represent a rotation applied to all @@ -57,24 +60,47 @@ Therefore, there is a clear reason behind the asymmetry among the replicas associated with the reference spin, and it was not due to an ad-hoc transformation as suggested by the referee. +As part of the rewriting of the manuscript for clarity, this subtlety has been +emphasized around what are now equations (9) and (10). + > The author analyses the problem using the Franz-Parisi potential, however, > this analysis does not seem to matter in the paper. We can read a comment at > the end of Sec.3.1 but without actual implications. It should be either > removed or expanded, at the moment it seems just without purpose. -The analysis of the Franz-Parisi potential has been moved to an appendix, with -a more explanatory discussion of our interest in it included in the manuscript. +The analysis of the Franz-Parisi potential has been moved to Appendix C, with a +more explanatory discussion of our interest in it included in the manuscript. +In short, the referee is right to point out that it has no implications for the +main topic of the paper. It is included because some specialists will be +interested in the comparison between it and the two-point complexity. This +reasoning is now explained at the beginning of Appendix C. > "We see arrangements of barriers relative to each other, perhaps...". Why > "perhaps"? Second, where is this analysis carried out? In the results > section, the author analyses stable minima and marginal states, I don't know > where to look. Adding a reference would have helped. +The sentence in question has now been rephrased, but "perhaps" was due to the +fact that not very much is learned about the mutual arrangement of saddles from +this work. In order to make clear what conclusions can be drawn about saddles +from our calculation, we have added a new subsection to the Results section, +3.2: Grouping of saddle points. This subsection contains two paragraphs +detailing what one might want to know about the geometry of saddle points, and +what we actually learn from the two-point complexity. + > After eq.3, the author comments on the replica ansatz, but this is out of > place. We are still introducing the model. It would be better to have it at > the end of the section (where indeed the author comes back to the same > concept) or remove it entirely. +The referee is right to point out this oversight, and the note about the +specific influence of the covariance function f on the form of RSB has been +moved into the details for the calculation of the complexity, in subsection +A.4: Replica ansatz and saddle point. Where it was in section 2 we now say + +"The choice of *f* has significant effect on the form of order in the model, and +this likewise influences the geometry of stationary points." + > fig.1, add a caption under each figure saying what they are (oriented > saddles, oriented minima, etc), it is much easier to read. @@ -83,43 +109,109 @@ The suggestion of the referee was good and was implemented in the new manuscript > fig.2, elaborate a bit more in the main text. This is introduced at the of > the section without any comment. +A paragraph discussing Fig. 2 has been added to the main text, and the end of +Section 2. + > fig.3, "the dot-dashed lines on both plots depict the trajectory of the solid > line on the other plot", which one? +The answer is both. This confusing sentence has been clarified in the new +manuscript. It now reads: + +"The dot-dashed line on the left plot depicts the trajectory of the solid line +on the right plot, and the dot-dashed line on the right plot depicts the +trajectory of the solid line on the left plot." + > fig.3, "In this case, the points lying nearest to the reference minimum are > saddle with mu\<mu, but with energies smaller than the threshold energy", so? > What is the implication? This misses a conclusion. -These low-lying saddles represent large-deviations from the typical complexity +These low-lying saddles represent large deviations from the typical complexity. +The point has been clarified by appending "which makes them an atypical +population of saddles" to the sentence. > Sec.3.1, the author comments on the similarity with the pure model, without > explaining what is similar. What should we expect on the p-spin? At least the > relevant aspects. It would also be useful to plot a version of Fig.3 for the > p-spin. It would make the discussion easier to follow. +In the reversed manuscript, the points of comparison with the pure models are +mode more explicit, as the referee suggests. We do not think it is necessary to +include a figure for the pure models, instead clarifying the most important +departure in the text: + +"The largest difference is the decoupling of nearby stable points from nearby +low-energy points: in the pure *p*-spin model, the left and right panels of +Fig. 3 would be identical up to a constant factor -*p*." + +For those interested in more detailed comparisons, the relevant figure for the +pure models is found in the paper twice cited in that subsection. + > "the nearest neighbour points are always oriented saddles", where do I see this? +We have added a sentence to clarify this point: + +"This is a result of the persistent presence of a negative isolated eigenvalue +in the spectrum of the nearest neighbors, e.g., as in the shaded regions of +Fig. 3." + > the sentence "like in the pure models, the emergence [...]" is extremely hard > to parse and the paragraph ends without a conclusion. What are the > consequences? +This sentence has been expanded to make it more clear, and the statement now reads + +"Like in the pure models, the minimum energy and maximum stability of nearby +points are not monotonic: there is a range of overlap where the minimum energy +of neighbors decreases with proximity. The emergence of oriented index-one +saddles along the line of lowest-energy states at a given overlap occurs at the +local minimum of this line, another similarity with the pure models [13]. It is +not clear why this should be true or what implications it has for behavior." + +We also now emphasize that the implications are not known. However, the +coincidence itself it interesting, at the very least for the ability to predict +where an isolated eigenvalue should destabilize nearby minima without making the +computation for the eigenvalue. + > at page 9, the author talk about \Sigma_12 that however has not been defined yet. +The referee is correct to point out this oversight, which has now been amended +by a qualitative definition of Σ₁₂ at the beginning of the results section. + > this section starts without explaining what is the strategy to solve the > problem. Explaining how the following subsection will contribute to the > solution without entering into the details of the computation would be of > great help. +The explanation of the calculation for the complexity has been reorganized and +expanded in the new manuscript. In part of this expansion, we added more +explanation of this kind. Most of this is now found in Appendix A. + > "This replica symmetry will be important later" how? Either we have an > explanation following or it should be removed. +The comment has been removed in the new manuscript. + > at the end of a step it would be good to wrap everything up. For instance, > sec.4.2 ends with "we do not include these details, which are standard" at > least give a reference. Second, add the final result. +In the revised manuscript, more has been done to wrap up each section. For instance, what was section 4.2 and is is now section A.2 now ends + +"The result of this calculation is found in the effective action (44), where it +contributes all terms besides the functions D contributed by the Hessian terms +in the previous section and the logarithms contributed by the +Hubbard–Stratonovich transformation of the next section." + > "there is a desert where none are found" -> solutions are exponentially rare > (or something else) +The statement has been rewritten, and now says + +"Therefore, marginal minima whose energy *E*₀ is greater than the threshold have +neighbors at arbitrarily close distance with a quadratic pseudogap, while those +whose energy is less than the threshold have an overlap gap." + > I would suggest a rewriting, especially the last sessions (4-6). I understand > the intention of removing simple details, but they should be replaced by > comments. The impression (which can be wrong but gives the idea) is of some @@ -127,6 +219,10 @@ These low-lying saddles represent large-deviations from the typical complexity > hard-to-follow computations. Finally, I would also recommend moving these > sections to an appendix (after acknowledgement and funding). +As suggested by both referees, much of the paper was rewritten and expanded, +especially to provide more details in the calculation of the complexity. It was +also rearranged to put most of those details in appendices. + For reviewer 1 > i) On page 7, when referring to the set of marginal states that attract @@ -156,14 +252,35 @@ the near neighborhood (equation 40 in the original manuscript). > straightforwardly generalize such a computation and give some insights in the > case of a sparse (no longer fully connected) model. -CITE VALENTINA?? +In the manuscript, we have added a small clarification as to the reason for this: + +"This is because the conditioning amounts to a rank-one perturbation to the +Hessian matrix, which does not affect the bulk of its spectrum." + +From this, one can reason that the same assumptions will hold whenever rank-one +perturbations do not affect the bulk spectrum. While we are not experts in the +theory of sparse matrices, it seems likely this condition breaks down when a +matrix is sufficiently sparse. > iv) Eq. (34) is quite complicated and difficult to grasp by eye. I thus > wonder whether the numerical protocol is robust enough to be sure that by > initializing differently, not exactly at q=0, the same solution is always > found. How sensitive is the protocol to the choice of initial conditions? -DO A LITTLE EXPERIMENT +It is true that (34) (now (11)) is quite complicated, but the numeric methods +we use find the same solutions quite robustly. First, we make use of arbitrary +precision arithmetic in Mathematica to ensure that the roots of the saddle +point equations derived from this expression are indeed good roots, in this +case with a 30-digit working precision. Second, the initialization near a good +solution known from analytics, namely the solution at q = 0, is crucial because +if initialized from random conditions a valid solution is never found. If we +initialize the root-finding algorithm using the known solution at q = 0 and +then attempt to solve the equations at some small q > 0, a consistent solution +is found so long as q is sufficiently small. This is also true if the initial +condition is randomly perturbed by a small amount. If the first q > 0 is too +large or the random perturbation is too large, only nonphysical solutions are +found. Luckily, we expect that the complexity of stationary points at different +proximities varies smoothly, so that this procedure is justified. > v) In Section 5, the analysis of an isolated eigenvalue, which can be > attributed to a low-rank perturbation in the Hessian matrix, is discussed. @@ -173,9 +290,17 @@ DO A LITTLE EXPERIMENT > double-well potential or to optimization problems relying on non-quadratic > functions (such as ReLu, sigmoid). -ALWAYS QUADRATIC! +The technique is quite generic, and the ability to apply it to other models +rests mostly in the tractability of the saddle point calculation. I guess the +reviewer is referencing the KHGPS model, or simple neural networks. The +principle challenge in these cases is the Kac–Rice calculation itself, which +has not be extended to systems without Gaussian disorder. If this were +resolved, using this technique to analyse the properties of an isolated +eigenvalue would be a painful corollary. (ReLu is problematic with respect to +these landscape methods, however, because it does not have a well-defined +Hessian everywhere.) -Requested changes +## Requested changes > I found the paper interesting but quite technical in some points. Moving the > saddle-point computations and part of the analysis (see for instance on pages @@ -183,4 +308,6 @@ Requested changes > results, especially for general readers without extensive expertise in the > replica trick and these models. -OK +As suggested by both referees, the paper was rearranged to put most of those +details in appendices. + |