diff options
-rw-r--r-- | 2-point.tex | 86 | ||||
-rw-r--r-- | response.md | 151 |
2 files changed, 200 insertions, 37 deletions
diff --git a/2-point.tex b/2-point.tex index 18c5a6b..8f54a6a 100644 --- a/2-point.tex +++ b/2-point.tex @@ -383,9 +383,11 @@ Fig.~\ref{fig:spectra}. their stability, e.g., corresponding with Fig.~\ref{fig:spectra}(d-e). The more darkly shaded are oriented index-one saddles, e.g., corresponding with Fig.~\ref{fig:spectra}(f). The dot-dashed line on the left plot depicts the - trajectory of the solid line on the right plot, and the dot-dashed line on the right plot likewise depicts the solid line on the left plot. In this case, the points - lying nearest to the reference minimum are saddles with $\mu<\mu_\mathrm - m$, but with energies smaller than the threshold energy. + trajectory of the solid line on the right plot, and the dot-dashed line on + the right plot depicts the trajectory of the solid line on the left plot. + In this case, the points lying nearest to the reference minimum are saddles + with $\mu<\mu_\mathrm m$, but with energies smaller than the threshold + energy, which makes them an atypical population of saddles. } \label{fig:min.neighborhood} \end{figure} @@ -397,7 +399,7 @@ Fig.~\ref{fig:min.neighborhood}. For stable minima, the qualitative results for the pure $p$-spin model continue to hold, with some small modifications \cite{Ros_2019_Complexity}. -The largest different is the decoupling of nearby +The largest difference is the decoupling of nearby stable points from nearby low-energy points: in the pure $p$-spin model, the left and right panels of Fig.~\ref{fig:min.neighborhood} would be identical up to a constant factor $-p$. Instead, for mixed models they differ substantially, @@ -406,11 +408,11 @@ would correspond exactly with the solid lines. One significant consequence of this difference is the diminished significance of the threshold energy $E_\text{th}$: in the left panel, marginal minima of the threshold energy are the most common among unconstrained points, but marginal minima of lower energy -are more common in the near vicinity of the example reference minimum, whose energy is lower than the threshold energy. +are more common in the near vicinity of the example reference minimum, whose energy is lower than the threshold energy. In the pure models, all marginal minima are at the threshold energy. The nearest neighbor points are always oriented saddles, sometimes saddles with an extensive index and sometimes index-one saddles -(Fig.~\ref{fig:spectra}(d, f)). Like in the pure models, the minimum energy and +(Fig.~\ref{fig:spectra}(d, f)). This is a result of the persistent presence of a negative isolated eigenvalue in the spectrum of the nearest neighbors, e.g., as in the shaded regions of Fig.~\ref{fig:min.neighborhood}. Like in the pure models, the minimum energy and maximum stability of nearby points are not monotonic: there is a range of overlap where the minimum energy of neighbors decreases with proximity. The emergence of oriented index-one saddles along the line of lowest-energy states @@ -552,7 +554,7 @@ extensive energy barriers}. Therefore, the picture of a marginal being connected by subextensive energy barriers can only describe the collection of marginal minima at the threshold energy, which is an atypical population of marginal minima. At energies both below and above the threshold energy, typical marginal minima are isolated from each other.\footnote{ -We must put a small caveat here: in \emph{any} situation, this calculation +We must put a small caveat here: for any combination of energy and stability of the reference point, this calculation admits order-one other marginal minima to lie a subextensive distance from the reference point. For such a population of points, $\Sigma_{12}=0$ and $q=1$, which is always a permitted solution when at least one marginal direction @@ -772,8 +774,7 @@ To examine better the population of marginal points, it is necessary to look at the next term in the series of the complexity with $\Delta q$, since the linear coefficient becomes zero at the marginal line. This tells us something intuitive: stable minima have an effective repulsion between points, and one -always finds a sufficiently small $\Delta q$ such that no stationary points are -point any nearer. For the marginal minima, it is not clear that the same should be true. +always finds a sufficiently small $\Delta q$ such that no stationary points are found any nearer. For the marginal minima, it is not clear that the same should be true. For marginal points with $\mu=\mu_\mathrm m$, the linear term above vanishes. Under these conditions, the quadratic term in the expansion for the dominant population of near neighbors is \begin{equation} @@ -1191,7 +1192,7 @@ fields the Hessian is independent of these \cite{Bray_2007_Statistics}. In principle the fact that we have conditioned the Hessian to belong to stationary points of certain energy, stability, and proximity to another stationary point will modify its statistics, but these changes will only appear at subleading -order in $N$ \cite{Ros_2019_Complexity}. This is because the conditioning amounts to a rank-one perturbation to the Hessian matrix. At leading order, the expectations related to different replicas factorize, each yielding +order in $N$ \cite{Ros_2019_Complexity}. This is because the conditioning amounts to a rank-one perturbation to the Hessian matrix, which does not affect the bulk of its spectrum. At leading order, the expectations related to different replicas factorize, each yielding \begin{equation} \overline{\big|\det\operatorname{Hess}H(\mathbf s,\omega)\big|\,\delta\big(N\mu-\operatorname{Tr}\operatorname{Hess}H(\mathbf s,\omega)\big)} =e^{N\int d\lambda\,\rho(\lambda+\mu)\log|\lambda|}\delta(N\mu-N\omega) @@ -1259,7 +1260,12 @@ We have written the $H$-dependent terms in this strange form for the ease of tak =e^{\frac12\int d\mathbf t\,d\mathbf t'\,\mathcal O(\mathbf t)\mathcal O(\mathbf t')\overline{H(\mathbf t)H(\mathbf t')}} =e^{N\frac12\int d\mathbf t\,d\mathbf t'\,\mathcal O(\mathbf t)\mathcal O(\mathbf t')f\big(\frac{\mathbf t\cdot\mathbf t'}N\big)} \end{equation} -It remains only to apply the doubled operators to $f$ and then evaluate the simple integrals over the $\delta$ measures. We do not include these details, which were carried out with computer algebra software. +It remains only to apply the doubled operators to $f$ and then evaluate the +simple integrals over the $\delta$ measures. We do not include these details, +which were carried out with computer algebra software. The result of this +calculation is found in the effective action \eqref{eq:intermed.complexity}, +where it contributes all terms besides the functions $\mathcal D$ contributed by the Hessian terms in the previous section and the +logarithms contributed by the Hubbard--Stratonovich transformation of the next section. \subsection{Hubbard--Stratonovich} \label{subsec:hubbard.strat} @@ -1312,7 +1318,16 @@ The integral over the vector fields $\mathbf a$ is Gaussian and can be evaluated Finally, the integral over $\hat Q$ can be evaluated using the saddle point method, giving $\hat Q=Q^{-1}$. Therefore, the term contributed to the effective action in the matrix fields as a result of the transformation is -$N(\frac12+\frac12\log\det Q)$. +\begin{equation} + \frac12\log\det Q + = + \frac12\log\det\begin{bmatrix} + C^{00}&iR^{00}&C^{01}&iR^{01}\\ + iR^{00}&D^{00}&iR^{10}&D^{01}\\ + C^{01}&iR^{10}&C^{11}&iR^{11}\\ + iR^{01}&D^{01}&iR^{11}&D^{11} + \end{bmatrix} +\end{equation} \subsection{Replica ansatz and saddle point} \label{subsec:saddle} @@ -1340,7 +1355,7 @@ Defining the `block' fields $\mathcal Q_{00}=(\hat\beta_0, \hat\mu_0, C^{00}, R^{00}, D^{00})$, $\mathcal Q_{11}=(\hat\beta_1, \hat\mu_1, C^{11}, R^{11}, D^{11})$, and $\mathcal Q_{01}=(\hat\mu_{01},C^{01},R^{01},R^{10},D^{01})$ the resulting complexity is -\begin{equation} +\begin{equation} \label{eq:intermed.complexity} \Sigma_{12} =\frac1N\lim_{n\to0}\lim_{m\to0}\frac\partial{\partial n}\int d\mathcal Q_{00}\,d\mathcal Q_{11}\,d\mathcal Q_{01}\,e^{Nm\mathcal S_0(\mathcal Q_{00})+Nn\mathcal S_1(\mathcal Q_{11},\mathcal Q_{01}\mid\mathcal Q_{00})} \end{equation} @@ -1352,7 +1367,7 @@ where &\quad+\frac1m\bigg\{ \frac12\sum_{ab}^m\left[ \hat\beta_1^2f(C^{00}_{ab})+(2\hat\beta_1R^{00}_{ab}-D^{00}_{ab})f'(C^{00}_{ab})+(R_{ab}^{00})^2f''(C_{ab}^{00}) - \right]+\frac12\log\det\begin{bmatrix}C^{00}&R^{00}\\R^{00}&D^{00}\end{bmatrix} + \right]+\frac12\log\det\begin{bmatrix}C^{00}&iR^{00}\\iR^{00}&D^{00}\end{bmatrix} \bigg\} \end{aligned} \end{equation} @@ -1896,22 +1911,48 @@ between the eigenvector $\mathbf x_\text{min}$ associated with the minimum eigen direction connecting the two stationary points $\mathbf x_{0\leftarrow1}$. The overlap between these vectors is directly related to the value of the order parameter $x_0=\frac1N\pmb\sigma_1\cdot\mathbf x_a$. This tangent vector is $\mathbf x_{0\leftarrow 1}=\frac1{1-q}\big(\pmb\sigma_1-q\mathbf s_a\big)$, which is normalized and -lies strictly in the tangent plane of $\mathbf s_a$. Then +lies strictly in the tangent plane of $\mathbf s_a$. Then the overlap between the two vectors is \begin{equation} q_\textrm{min}=\frac{\mathbf x_{0\leftarrow 1}\cdot\mathbf x_\mathrm{min}}N =\frac{x_0}{1-q} \end{equation} -where $\mathrm x_\text{min}\cdot\mathrm s_a=0$ because of the restriction of -the $\mathrm x$ vectors to the tangent plane at $\mathrm s_a$. +where $\mathbf x_\text{min}\cdot\mathbf s_a=0$ because of the restriction of +the $\mathbf x$ vectors to the tangent plane at $\mathbf s_a$. \section{Comparison with the Franz--Parisi potential} \label{sec:franz-parisi} -Here, we compute the Franz--Parisi potential for this model at zero +The comparison between the Franz--Parisi potential at zero temperature and the +minimum-energy limit of the two-point complexity is of interest to some +specialists because the two computations qualitatively describe the same thing. +However, it was previously found that the two computations produce different +results in the pure spherical models, to the surprise of those researchers +\cite{Ros_2019_Complexity}. Understanding this difference is subtle. The +zero-temperature Franz--Parisi potential underestimates the energy where nearby +minima are found, because it includes any configuration that is a minimum on +the subspace created by constraining the overlap. Many of these configurations +will not have zero gradient perpendicular to the overlap constraint manifold, +and therefore are not proper minima of the energy. + +A strange feature of the comparison for the pure spherical models was that the +two-point complexity and the Franz--Parisi potential coincided at their local +maximum in $q$. It is not clear why this coincidence occurs, but it is good +news for those who use the Franz--Parisi potential to estimate the height of +the free energy barrier between states. Though it everywhere else +underestimates the energy of nearby states, it correctly gives the value of +this highest barrier. + +Here, we compute the Franz--Parisi potential for the mixed spherical models at zero temperature, with respect to a reference configuration fixed to be a stationary -point of energy $E_0$ and stability $\mu_0$ as before \cite{Franz_1995_Recipes, -Franz_1998_EffectivePotential}. The potential is defined as the average free +point of energy $E_0$ and stability $\mu_0$ \cite{Franz_1995_Recipes, +Franz_1998_EffectivePotential}. Comparing with the lower energy boundary of the +2-point complexity, we find that the story in the mixed models is the same as +that in the pure models: the Franz--Parisi potential underestimates the lowest +energy of nearby minima almost everywhere except at its peak, where the two +measures coincide. + +The potential is defined as the average free energy of a system constrained to lie with a fixed overlap $q$ with a reference configuration (here a stationary point with fixed energy and stability), and given by @@ -2001,11 +2042,6 @@ saddles is found in Fig.~\ref{fig:franz-parisi}. As noted above, there is little qualitatively different from what was found in \cite{Ros_2019_Complexity} for the pure models. -Also like the pure models, there is a correspondence between the maximum of the -zero-temperature Franz--Parisi potential restricted to minima of the specified -type and the local maximum of the neighbor complexity along the line of -lowest-energy states. This is seen in Fig.~\ref{fig:franz-parisi}. - \begin{figure} \centering \includegraphics{figs/franz_parisi.pdf} diff --git a/response.md b/response.md index f6fcfbf..39042a0 100644 --- a/response.md +++ b/response.md @@ -23,10 +23,10 @@ The interpretation is more subtle. Two different stationary points cannot lie at the same point, but the complexity calculation only resolves numbers of points that are exponential in N and differences in overlap that are linear in N. Therefore, the complexity calculation is compatible with many stationary -points being contained in the subextensive region of dimension Δq = O(1) around -any reference point. We can reason as to where these extremely near neighbors -are likely or unlikely to exist in specific conditions, but the complexity -calculation cannot rule them out. +points being contained in the subextensive region of dimension Δq = O(1/N) +around any reference point. We can reason as to where these extremely near +neighbors are likely or unlikely to exist in specific conditions, but the +complexity calculation cannot rule them out. This is point is not crucial to anything in the paper, except to make more precise the statement that non-threshold marginal minima are separated by a gap @@ -37,6 +37,9 @@ neighbors exist, they are irrelevant to dynamics: the entire group is isolated, since the complexity of similar stationary points at a small but extensive overlap further is negative. +Because the point is not important to the conclusions of the paper, the +paragraph has been revised for clarity and moved to a footnote. + > At the technical level, I am confused by one of the constraints imposed in > eq.16, when \sigma_1 couples with all replicated s_a. I was expecting a sum > of \sigma_b.s_a over a and b. This may represent a rotation applied to all @@ -57,24 +60,47 @@ Therefore, there is a clear reason behind the asymmetry among the replicas associated with the reference spin, and it was not due to an ad-hoc transformation as suggested by the referee. +As part of the rewriting of the manuscript for clarity, this subtlety has been +emphasized around what are now equations (9) and (10). + > The author analyses the problem using the Franz-Parisi potential, however, > this analysis does not seem to matter in the paper. We can read a comment at > the end of Sec.3.1 but without actual implications. It should be either > removed or expanded, at the moment it seems just without purpose. -The analysis of the Franz-Parisi potential has been moved to an appendix, with -a more explanatory discussion of our interest in it included in the manuscript. +The analysis of the Franz-Parisi potential has been moved to Appendix C, with a +more explanatory discussion of our interest in it included in the manuscript. +In short, the referee is right to point out that it has no implications for the +main topic of the paper. It is included because some specialists will be +interested in the comparison between it and the two-point complexity. This +reasoning is now explained at the beginning of Appendix C. > "We see arrangements of barriers relative to each other, perhaps...". Why > "perhaps"? Second, where is this analysis carried out? In the results > section, the author analyses stable minima and marginal states, I don't know > where to look. Adding a reference would have helped. +The sentence in question has now been rephrased, but "perhaps" was due to the +fact that not very much is learned about the mutual arrangement of saddles from +this work. In order to make clear what conclusions can be drawn about saddles +from our calculation, we have added a new subsection to the Results section, +3.2: Grouping of saddle points. This subsection contains two paragraphs +detailing what one might want to know about the geometry of saddle points, and +what we actually learn from the two-point complexity. + > After eq.3, the author comments on the replica ansatz, but this is out of > place. We are still introducing the model. It would be better to have it at > the end of the section (where indeed the author comes back to the same > concept) or remove it entirely. +The referee is right to point out this oversight, and the note about the +specific influence of the covariance function f on the form of RSB has been +moved into the details for the calculation of the complexity, in subsection +A.4: Replica ansatz and saddle point. Where it was in section 2 we now say + +"The choice of *f* has significant effect on the form of order in the model, and +this likewise influences the geometry of stationary points." + > fig.1, add a caption under each figure saying what they are (oriented > saddles, oriented minima, etc), it is much easier to read. @@ -83,43 +109,109 @@ The suggestion of the referee was good and was implemented in the new manuscript > fig.2, elaborate a bit more in the main text. This is introduced at the of > the section without any comment. +A paragraph discussing Fig. 2 has been added to the main text, and the end of +Section 2. + > fig.3, "the dot-dashed lines on both plots depict the trajectory of the solid > line on the other plot", which one? +The answer is both. This confusing sentence has been clarified in the new +manuscript. It now reads: + +"The dot-dashed line on the left plot depicts the trajectory of the solid line +on the right plot, and the dot-dashed line on the right plot depicts the +trajectory of the solid line on the left plot." + > fig.3, "In this case, the points lying nearest to the reference minimum are > saddle with mu\<mu, but with energies smaller than the threshold energy", so? > What is the implication? This misses a conclusion. -These low-lying saddles represent large-deviations from the typical complexity +These low-lying saddles represent large deviations from the typical complexity. +The point has been clarified by appending "which makes them an atypical +population of saddles" to the sentence. > Sec.3.1, the author comments on the similarity with the pure model, without > explaining what is similar. What should we expect on the p-spin? At least the > relevant aspects. It would also be useful to plot a version of Fig.3 for the > p-spin. It would make the discussion easier to follow. +In the reversed manuscript, the points of comparison with the pure models are +mode more explicit, as the referee suggests. We do not think it is necessary to +include a figure for the pure models, instead clarifying the most important +departure in the text: + +"The largest difference is the decoupling of nearby stable points from nearby +low-energy points: in the pure *p*-spin model, the left and right panels of +Fig. 3 would be identical up to a constant factor -*p*." + +For those interested in more detailed comparisons, the relevant figure for the +pure models is found in the paper twice cited in that subsection. + > "the nearest neighbour points are always oriented saddles", where do I see this? +We have added a sentence to clarify this point: + +"This is a result of the persistent presence of a negative isolated eigenvalue +in the spectrum of the nearest neighbors, e.g., as in the shaded regions of +Fig. 3." + > the sentence "like in the pure models, the emergence [...]" is extremely hard > to parse and the paragraph ends without a conclusion. What are the > consequences? +This sentence has been expanded to make it more clear, and the statement now reads + +"Like in the pure models, the minimum energy and maximum stability of nearby +points are not monotonic: there is a range of overlap where the minimum energy +of neighbors decreases with proximity. The emergence of oriented index-one +saddles along the line of lowest-energy states at a given overlap occurs at the +local minimum of this line, another similarity with the pure models [13]. It is +not clear why this should be true or what implications it has for behavior." + +We also now emphasize that the implications are not known. However, the +coincidence itself it interesting, at the very least for the ability to predict +where an isolated eigenvalue should destabilize nearby minima without making the +computation for the eigenvalue. + > at page 9, the author talk about \Sigma_12 that however has not been defined yet. +The referee is correct to point out this oversight, which has now been amended +by a qualitative definition of Σ₁₂ at the beginning of the results section. + > this section starts without explaining what is the strategy to solve the > problem. Explaining how the following subsection will contribute to the > solution without entering into the details of the computation would be of > great help. +The explanation of the calculation for the complexity has been reorganized and +expanded in the new manuscript. In part of this expansion, we added more +explanation of this kind. Most of this is now found in Appendix A. + > "This replica symmetry will be important later" how? Either we have an > explanation following or it should be removed. +The comment has been removed in the new manuscript. + > at the end of a step it would be good to wrap everything up. For instance, > sec.4.2 ends with "we do not include these details, which are standard" at > least give a reference. Second, add the final result. +In the revised manuscript, more has been done to wrap up each section. For instance, what was section 4.2 and is is now section A.2 now ends + +"The result of this calculation is found in the effective action (44), where it +contributes all terms besides the functions D contributed by the Hessian terms +in the previous section and the logarithms contributed by the +Hubbard–Stratonovich transformation of the next section." + > "there is a desert where none are found" -> solutions are exponentially rare > (or something else) +The statement has been rewritten, and now says + +"Therefore, marginal minima whose energy *E*₀ is greater than the threshold have +neighbors at arbitrarily close distance with a quadratic pseudogap, while those +whose energy is less than the threshold have an overlap gap." + > I would suggest a rewriting, especially the last sessions (4-6). I understand > the intention of removing simple details, but they should be replaced by > comments. The impression (which can be wrong but gives the idea) is of some @@ -127,6 +219,10 @@ These low-lying saddles represent large-deviations from the typical complexity > hard-to-follow computations. Finally, I would also recommend moving these > sections to an appendix (after acknowledgement and funding). +As suggested by both referees, much of the paper was rewritten and expanded, +especially to provide more details in the calculation of the complexity. It was +also rearranged to put most of those details in appendices. + For reviewer 1 > i) On page 7, when referring to the set of marginal states that attract @@ -156,14 +252,35 @@ the near neighborhood (equation 40 in the original manuscript). > straightforwardly generalize such a computation and give some insights in the > case of a sparse (no longer fully connected) model. -CITE VALENTINA?? +In the manuscript, we have added a small clarification as to the reason for this: + +"This is because the conditioning amounts to a rank-one perturbation to the +Hessian matrix, which does not affect the bulk of its spectrum." + +From this, one can reason that the same assumptions will hold whenever rank-one +perturbations do not affect the bulk spectrum. While we are not experts in the +theory of sparse matrices, it seems likely this condition breaks down when a +matrix is sufficiently sparse. > iv) Eq. (34) is quite complicated and difficult to grasp by eye. I thus > wonder whether the numerical protocol is robust enough to be sure that by > initializing differently, not exactly at q=0, the same solution is always > found. How sensitive is the protocol to the choice of initial conditions? -DO A LITTLE EXPERIMENT +It is true that (34) (now (11)) is quite complicated, but the numeric methods +we use find the same solutions quite robustly. First, we make use of arbitrary +precision arithmetic in Mathematica to ensure that the roots of the saddle +point equations derived from this expression are indeed good roots, in this +case with a 30-digit working precision. Second, the initialization near a good +solution known from analytics, namely the solution at q = 0, is crucial because +if initialized from random conditions a valid solution is never found. If we +initialize the root-finding algorithm using the known solution at q = 0 and +then attempt to solve the equations at some small q > 0, a consistent solution +is found so long as q is sufficiently small. This is also true if the initial +condition is randomly perturbed by a small amount. If the first q > 0 is too +large or the random perturbation is too large, only nonphysical solutions are +found. Luckily, we expect that the complexity of stationary points at different +proximities varies smoothly, so that this procedure is justified. > v) In Section 5, the analysis of an isolated eigenvalue, which can be > attributed to a low-rank perturbation in the Hessian matrix, is discussed. @@ -173,9 +290,17 @@ DO A LITTLE EXPERIMENT > double-well potential or to optimization problems relying on non-quadratic > functions (such as ReLu, sigmoid). -ALWAYS QUADRATIC! +The technique is quite generic, and the ability to apply it to other models +rests mostly in the tractability of the saddle point calculation. I guess the +reviewer is referencing the KHGPS model, or simple neural networks. The +principle challenge in these cases is the Kac–Rice calculation itself, which +has not be extended to systems without Gaussian disorder. If this were +resolved, using this technique to analyse the properties of an isolated +eigenvalue would be a painful corollary. (ReLu is problematic with respect to +these landscape methods, however, because it does not have a well-defined +Hessian everywhere.) -Requested changes +## Requested changes > I found the paper interesting but quite technical in some points. Moving the > saddle-point computations and part of the analysis (see for instance on pages @@ -183,4 +308,6 @@ Requested changes > results, especially for general readers without extensive expertise in the > replica trick and these models. -OK +As suggested by both referees, the paper was rearranged to put most of those +details in appendices. + |