Change addressing report #3, weakness 4.

Added paragraphs discussion the relationship between our conjecture and both different algorithms and different models.
author: Jaron Kent-Dobias <jaron@kent-dobias.com> 2025-03-07 17:49:40 -0300
committer: Jaron Kent-Dobias <jaron@kent-dobias.com> 2025-03-07 17:49:40 -0300
commit: 4f92e57f19f7e306d5293f1289e002e90cf1817e (patch)
tree: fb2db2c664d8d5dbdafe62c5b7e3aefc998b8f7f
parent: e6a4a9c54bcd2a5e8057ceabd6c6940ff896094a (diff)
download: SciPostPhys_18_158-4f92e57f19f7e306d5293f1289e002e90cf1817e.tar.gz
SciPostPhys_18_158-4f92e57f19f7e306d5293f1289e002e90cf1817e.tar.bz2
SciPostPhys_18_158-4f92e57f19f7e306d5293f1289e002e90cf1817e.zip
3 files changed, 67 insertions, 2 deletions
diff --git a/referee_response.md b/referee_response.md
index 771f046..78a3276 100644
--- a/referee_response.md
+++ b/referee_response.md
@@ -126,8 +126,8 @@ Ask for minor revision
  3. Ok
    * The referee is wrong to say that the Euler characteristic of a hypersphere is 2 independent of dimension. The Euler characteristic of all odd-dimensional manifolds is zero. Consider the cell complex on *S*₁ [pictured here](https://kent-dobias.com/files/S_1.png). The Euler characteristic calculated using the alternating sum over the number of cells of increasing dimension is χ(*S*₁) = 1 – 1 = 0.
    * Ok
- 4. Ok
-   * Ok - discuss planting in manuscript, raise skepticism of results of fear paper.
+ 4. The referee points out that previous work on gradient descent in the spherical spin glasses studied gradient descent from both uniformly random initial conditions ("infinite" temperature) and initial conditions drawn from a Boltzmann distribution at some finite temperature, and found that the final state of the dynamics reached marginal minima in a range of energies depending on the initial condition. The conjecture in this manuscript seeks only to explain the upper energy of this range, that associated with gradient descent from a uniformly random initial condition. Presumably there are a variety of behaviors observable by choosing initial conditions using a variety of initial distributions, Boltzmann or otherwise, and one day we may hope to address such questions using similar approaches to this paper. However, this is not addressed here. A small discussion of this point has been added to the manuscript.
+   * A paragraph addressing what might occur in planted models has been added to the manuscript.
  5. Make a supplementary materials file
    * The manuscript has been modified to clarify where a review of superspace methods can be found in the referenced material.
    * The subscript notation associated with the determinant has been explained in a footnote.
diff --git a/topology.bib b/topology.bib
index bb46305..5fc6836 100644
--- a/topology.bib
+++ b/topology.bib
@@ -675,3 +675,31 @@
  eprinttype = {arxiv}
 }
 
+@inproceedings{Mannelli_2019_Who,
+ author = {Sarao Mannelli, Stefano and Biroli, Giulio and Cammarota, Chiara and Krzakala, Florent and Zdeborová, Lenka},
+ title = {Who is Afraid of Big Bad Minima? Analysis of gradient-flow in spiked matrix-tensor models},
+ publisher = {Curran Associates, Inc.},
+ year = {2019},
+ volume = {32},
+ pages = {},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2019/file/fbad540b2f3b5638a9be9aa6a4d8e450-Paper.pdf},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {Wallach, H. and Larochelle, H. and Beygelzimer, A. and Alché-Buc, F. d' and Fox, E. and Garnett, R.}
+}
+
+@inproceedings{Mannelli_2019_Passed,
+ author = {Mannelli, Stefano Sarao and Krzakala, Florent and Urbani, Pierfrancesco and Zdeborova, Lenka},
+ title = {Passed \& Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models},
+ publisher = {PMLR},
+ year = {2019},
+ month = {09--15 Jun},
+ volume = {97},
+ pages = {4333--4342},
+ url = {https://proceedings.mlr.press/v97/mannelli19a.html},
+ abstract = {In this work we analyse quantitatively the interplay between the loss landscape and performance of descent algorithms in a prototypical inference problem, the spiked matrix-tensor model. We study a loss function that is the negative log-likelihood of the model. We analyse the number of local minima at a fixed distance from the signal/spike with the Kac-Rice formula, and locate trivialization of the landscape at large signal-to-noise ratios. We evaluate analytically the performance of a gradient flow algorithm using integro-differential PDEs as developed in physics of disordered systems for the Langevin dynamics. We analyze the performance of an approximate message passing algorithm estimating the maximum likelihood configuration via its state evolution. We conclude by comparing the above results: while we observe a drastic slow down of the gradient flow dynamics even in the region where the landscape is trivial, both the analyzed algorithms are shown to perform well even in the part of the region of parameters where spurious local minima are present.},
+ booktitle = {Proceedings of the 36th International Conference on Machine Learning},
+ editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
+ pdf = {http://proceedings.mlr.press/v97/mannelli19a/mannelli19a.pdf},
+ series = {Proceedings of Machine Learning Research}
+}
+
diff --git a/topology.tex b/topology.tex
index 1d97c2c..281831c 100644
--- a/topology.tex
+++ b/topology.tex
@@ -728,6 +728,43 @@ asymptotic values are needed to support or refute this conjecture. This
 motivates working to integrate the \textsc{dmft} equations to longer times, or
 else look for analytic asymptotic solutions that approach $E_\text{sh}$.
 
+The shattering energy appears consistent with the energy reached by gradient
+descent from a uniformly random initial condition, but other algorithms find
+minima at other energies. Optimal message passing algorithms were shown to find
+configurations at an energy level where another topological property---the
+overlap gap property---transitions, and this energy level is believed to bound from below
+all polynomial-time algorithms. On the other hand, physically inspired
+modifications of gradient descent---notably, drawing the initial condition from
+a nonuniform distribution like the Boltzmann distribution with a finite
+temperature---can find energy configurations with energies lower than those
+found with gradient descent from a uniform initial condition. If the
+topological transition described in this paper does predict the asymptotic
+performance of gradient descent from a uniform initial condition, then it
+provides a topological bound from above for the performance of reasonable
+algorithms that terminate in minima. Whether the performance of gradient
+descent from better initial conditions, or of other algorithms like simulated
+annealing, can be predicted with a similar method is not known.
+
+Finally, a common extension of the spherical spin glasses is to add a
+deterministic piece to the energy, sometimes called a signal or a spike. Recent
+work argued that gradient descent can avoid being trapped in marginal minima
+and reach the vicinity of the signal if the set of trapping marginal minima has
+been destabilized by the presence of the signal \cite{Mannelli_2019_Passed,
+Mannelli_2019_Who}. The authors of Ref.~\cite{Mannelli_2019_Who}
+conjecture based on \textsc{dmft} data for $2+3$ mixed spherical spin glasses that the
+trapping marginal minima are those at the traditional threshold energy
+$E_\text{th}$. However, Ref.~\cite{Folena_2023_On} demonstrated that in mixed
+$p+s$ spherical spin glasses with small $p$ and $s$, the difference between
+$E_\text{th}$ and the true trapping energy is difficult to resolve with the
+current precision of \textsc{dmft} integration schemes. Therefore,
+the authors of Ref.~\cite{Mannelli_2019_Who} may have incorrectly conflated the
+threshold with the trapping marginal minima, and that the correct set of
+marginal minima that must be destabilized to reach a signal might be the same set
+that trap dynamics in the signal-free model. This paper conjectures that the important trapping minima
+are those at the shattering energy. Comparing the predictions of
+Ref.~\cite{Mannelli_2019_Who} to \textsc{dmft} simulations of a model with
+better separation between $p$ and $s$ would help resolve this issue.
+
 \section{Conclusion}
 \label{sec:conclusion}
author	Jaron Kent-Dobias <jaron@kent-dobias.com>	2025-03-07 17:49:40 -0300
committer	Jaron Kent-Dobias <jaron@kent-dobias.com>	2025-03-07 17:49:40 -0300
commit	4f92e57f19f7e306d5293f1289e002e90cf1817e (patch)
tree	fb2db2c664d8d5dbdafe62c5b7e3aefc998b8f7f
parent	e6a4a9c54bcd2a5e8057ceabd6c6940ff896094a (diff)
download	SciPostPhys_18_158-4f92e57f19f7e306d5293f1289e002e90cf1817e.tar.gz SciPostPhys_18_158-4f92e57f19f7e306d5293f1289e002e90cf1817e.tar.bz2 SciPostPhys_18_158-4f92e57f19f7e306d5293f1289e002e90cf1817e.zip