diff options
author | Jaron Kent-Dobias <jaron@kent-dobias.com> | 2024-06-10 19:29:07 +0200 |
---|---|---|
committer | Jaron Kent-Dobias <jaron@kent-dobias.com> | 2024-06-10 19:29:07 +0200 |
commit | 2de476de17aea90b040875a7e3e87c308944a035 (patch) | |
tree | c1e7eeca470f42d3c6fae69f48871b44725a4c8e | |
parent | d02197918d759195dd1d863de28f352f70cbf14c (diff) | |
download | marginal-2de476de17aea90b040875a7e3e87c308944a035.tar.gz marginal-2de476de17aea90b040875a7e3e87c308944a035.tar.bz2 marginal-2de476de17aea90b040875a7e3e87c308944a035.zip |
Lots of writing, especially in appendix on superspace.
-rw-r--r-- | marginal.tex | 169 |
1 files changed, 139 insertions, 30 deletions
diff --git a/marginal.tex b/marginal.tex index 246bcaf..024b0ac 100644 --- a/marginal.tex +++ b/marginal.tex @@ -442,6 +442,40 @@ Finally, the marginal complexity is defined by evaluating the complexity conditi =\Sigma_0(E,\mu_\text m(E)) \end{equation} +\subsection{General features of saddle point computation} + +\begin{align} + \label{eq:delta.grad} + &\delta\big(\nabla H(\mathbf x_a,\pmb\omega_a)\big) + =\int\frac{d\hat{\mathbf x}_a}{(2\pi)^N}e^{i\hat{\mathbf x}_a^T\nabla H(\mathbf x_a,\pmb\omega_a)} \\ + \label{eq:delta.energy} + &\delta\big(NE-H(\mathbf x_a)\big) + =\int\frac{d\hat\beta_a}{2\pi}e^{\hat\beta_a(NE-H(\mathbf x_a))} \\ + &\delta\big(N\lambda^*-\mathbf s^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\mathbf s\big) + \label{eq:delta.eigen} + =\int\frac{d\hat\lambda_a}{2\pi}e^{\hat\lambda_a(N\lambda^*-\mathbf s^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\mathbf s)} +\end{align} + +Here we will merely sketch the steps that are standard. We start by translating elements of the Kac--Rice measure into terms more familiar to physicists. This means writing \eqref{eq:delta.grad}, \eqref{eq:delta.energy}, and \eqref{eq:delta.eigen} +for the Dirac $\delta$ functions. At this point we will also discuss an +important step we will use repeatedly in this paper: to drop the absolute value +signs around the determinant in the Kac--Rice measure. This can potentially +lead to severe problems with the complexity. However, it is a justified step +when the parameters of the problem, i.e., $E$, $\mu$, and $\lambda^*$ put us in +a regime where the exponential majority of stationary points have the same +index. This is true for maxima and minima, and for saddle points whose spectra have a strictly positive bulk with a fixed number of negative +outliers. Dropping the absolute value sign allows us to write +\begin{equation} + \det\operatorname{Hess}H(\mathbf x_a, \pmb\omega_a) + =\int d\pmb\eta_a\,d\bar{\pmb\eta}_a\,e^{\bar{\pmb\eta}_a^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\pmb\eta_a} +\end{equation} +for $N$-dimensional Grassmann variables $\bar{\pmb\eta}_a$ and $\pmb\eta_a$. For +the spherical models this step is unnecessary, since there are other ways to +treat the determinant keeping the absolute value signs, as in previous works +\cite{Folena_2020_Rethinking, Kent-Dobias_2023_How}. However, since other of +our examples are for models where the same techniques are impossible, it is +useful to see the fermionic method in action in this simple case. + \section{Examples} \subsection{Spherical spin glasses} @@ -475,36 +509,7 @@ calculation for this case, since we will something about its application in more nontrivial settings. The procedure to treat the complexity of the spherical models has been made in -detail elsewhere \cite{Kent-Dobias_2023_How}. Here we will merely sketch the steps that are standard. We start by translating elements of the Kac--Rice measure into terms more familiar to physicists. This means writing -\begin{align} - \label{eq:delta.grad} - \delta\big(\nabla H(\mathbf x_a,\pmb\omega_a)\big) - &=\int\frac{d\hat{\mathbf x}_a}{(2\pi)^N}e^{i\hat{\mathbf x}_a^T\nabla H(\mathbf x_a,\pmb\omega_a)} \\ - \label{eq:delta.energy} - \delta\big(NE-H(\mathbf x_a)\big) - &=\int\frac{d\hat\beta_a}{2\pi}e^{\hat\beta_a(NE-H(\mathbf x_a))} \\ - \delta\big(N\lambda^*-\mathbf s^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\mathbf s\big) - \label{eq:delta.eigen} - &=\int\frac{d\hat\lambda_a}{2\pi}e^{\hat\lambda_a(N\lambda^*-\mathbf s^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\mathbf s)} -\end{align} -for the Dirac $\delta$ functions. At this point we will also discuss an -important step we will use repeatedly in this paper: to drop the absolute value -signs around the determinant in the Kac--Rice measure. This can potentially -lead to severe problems with the complexity. However, it is a justified step -when the parameters of the problem, i.e., $E$, $\mu$, and $\lambda^*$ put us in -a regime where the exponential majority of stationary points have the same -index. This is true for maxima and minima, and for saddle points whose spectra have a strictly positive bulk with a fixed number of negative -outliers. Dropping the absolute value sign allows us to write -\begin{equation} - \det\operatorname{Hess}H(\mathbf x_a, \pmb\omega_a) - =\int d\pmb\eta_a\,d\bar{\pmb\eta}_a\,e^{\bar{\pmb\eta}_a^T\operatorname{Hess}H(\mathbf x_a,\pmb\omega)\pmb\eta_a} -\end{equation} -for $N$-dimensional Grassmann variables $\bar{\pmb\eta}_a$ and $\pmb\eta_a$. For -the spherical models this step is unnecessary, since there are other ways to -treat the determinant keeping the absolute value signs, as in previous works -\cite{Folena_2020_Rethinking, Kent-Dobias_2023_How}. However, since other of -our examples are for models where the same techniques are impossible, it is -useful to see the fermionic method in action in this simple case. +detail elsewhere \cite{Kent-Dobias_2023_How}. Once these substitutions have been made, the entire expression \eqref{eq:min.complexity.expanded} is an exponential integral whose argument is @@ -954,6 +959,110 @@ taking the zero-temperature limit, we find \appendix +\section{A primer on superspace} +\label{sec:superspace} + +The superspace $\mathbb R^{N|2D}$ is a vector space with $N$ real indices and +$2D$ Grassmann indices $\bar\theta_1,\theta_1,\ldots,\bar\theta_D,\theta_D$. +The Grassmann indices anticommute like fermions. Their integration is defined by +\begin{equation} + \int d\theta\,\theta=1 + \qquad + \int d\theta\,1=0 +\end{equation} +Because the Grassmann indices anticommute, their square is always zero. +Therefore, any series expansion of a function with respect to a given Grassmann +index will terminate exactly at linear order, while a series expansion with +respect to $n$ Grassmann variables will terminate exactly at $n$th order. If +$f$ is an arbitrary function, then +\begin{equation} + \int d\theta\,f(a+b\theta) + =\int d\theta\,\left[f(a)+f'(a)b\theta\right] + =f'(a)b +\end{equation} +This kind of behavior of integrals over the Grassmann indices makes them useful +for compactly expressing the Kac--Rice measure. To see why, consider the +specific superspace $\mathbb R^{N|2}$, where an arbitrary vector can be expression as +\begin{equation} + \pmb\phi(1)=\mathbf x+\bar\theta_1\pmb\eta+\bar{\pmb\eta}\theta_1+\bar\theta_1\theta_1i\hat{\mathbf x} +\end{equation} +where $\mathbf x,\hat{\mathbf x}\in\mathbb R^N$ and $\bar{\pmb\eta},\pmb\eta$ are +$N$-dimensional Grassmann vectors. The dependence of $\pmb\phi$ on 1 indicates +the index of Grassmann variables $\bar\theta_1,\theta_1$ inside, since we will +sometimes want to use, e.g., $\pmb\phi(2)$ defined identically save for +substitution by $\bar\theta_2,\theta_2$. Consider the series expansion of an arbitrary function $f$ of this supervector: +\begin{equation} + \begin{aligned} + f\big(\pmb\phi(1)\big) + &=f(\mathbf x) + +\big(\bar\theta_1\pmb\eta+\bar{\pmb\eta}\theta_1+\bar\theta_1\theta_1i\hat{\mathbf x}\big)^T\partial f(\mathbf x) \\ + &\quad+\frac12\big(\bar\theta_1\pmb\eta+\bar{\pmb\eta}\theta_1\big)^T\partial\partial f(\mathbf x)\big(\bar\theta_1\pmb\eta+\bar{\pmb\eta}\theta_1\big) \\ + &=f(\mathbf x) + +\big(\bar\theta_1\pmb\eta+\bar{\pmb\eta}\theta_1+\bar\theta_1\theta_1i\hat{\mathbf x}\big)^T\partial f(\mathbf x) \\ + &\qquad-\bar\theta_1\theta_1\bar{\pmb\eta}^T\partial\partial f(\mathbf x)\pmb\eta + \end{aligned} +\end{equation} +where the last step we used the fact that the Hessian matrix is symmetric and +that squares of Grassmann indicies vanish. Using the integration rules defined above, we find +\begin{equation} + \int d\theta_1\,d\bar\theta_1\,f\big(\pmb\phi(1)\big) + =i\hat{\mathbf x}^T\partial f(\mathbf x)-\bar{\pmb\eta}^T\partial\partial f(\mathbf x)\pmb\eta +\end{equation} +These two terms are precisely the exponential representation of the Dirac +$\delta$ function of the gradient and determinant of the Hessian (without +absolute value sign) that make up the basic Kac--Rice measure, so that we can write +\begin{equation} + \begin{aligned} + &\int d\mathbf x\,\delta\big(\nabla H(\mathbf x)\big)\,\det\operatorname{Hess}H(\mathbf x) \\ + &\qquad=\int d\mathbf x\,d\bar{\pmb\eta}\,d\pmb\eta\,\frac{d\hat{\mathbf x}}{(2\pi)^N}\,e^{i\hat{\mathbf x}^T\nabla H(\mathbf x)-\bar{\pmb\eta}^T\operatorname{Hess}H(\mathbf x)\pmb\eta} \\ + &\qquad=\int d\pmb\phi\,e^{\int d1\,H(\pmb\phi(1))} + \end{aligned} +\end{equation} +where we have written $d1=d\theta_1\,d\bar\theta_1$ and $d\pmb\phi=d\mathbf +x\,d\bar{\pmb\eta}\,d\pmb\eta\,\frac{d\hat{\mathbf x}}{(2\pi)^N}$. Besides some deep connections +to the physics of BRST, this compact notation dramatically simplifies the +analytical treatment of the problem. The reason why this simplification is +possible is because there are a large variety of superspace algebraic and +integral operations with direct corollaries to their ordinary real +counterparts. For instance, consider a super linear operator $M(1,2)$, which +like the super vector $\pmb\phi$ is made up of a linear combination of $N\times +N$ regular or Grassmann matrices indexed by every nonvanishing combination of +the Grassmann indices $\bar\theta_1,\theta_1,\bar\theta_2,\theta_2$. Such a supermatrix acts on supervectors by ordinary matrix multiplication and convolution in the Grassmann indices, i.e., +\begin{equation} + (M\pmb\phi)(1)=\int d1\,M(1,2)\pmb\phi(2) +\end{equation} +Integrals involving superfields contracted into such operators result in schematically familiar expressions, like that of the standard Gaussian: +\begin{equation} + \int d\pmb\phi\,e^{\int\,d1\,d2\,\pmb\phi(1)^TM(1,2)\pmb\phi(2)} + =(\operatorname{sdet}M)^{-N/2} +\end{equation} +where the usual role of the determinant is replaced by the superdeterminant. +The superdeterminant can be defined using the ordinary determinant by writing a +block version of the matrix $M$: if $\mathbf e(1)=\{1,\bar\theta_1\theta\}$ is +the basis vector of the even subspace of the superspace and $\mathbf +f(1)=\{\bar\theta_1,\theta_1\}$ is that of the odd subspace, then we can form a +block representation of $M$ in analogy to the matrix form of an operator in quantum mechanics by +\begin{equation} + \int d1\,d2\,\begin{bmatrix} + \mathbf e(1)M(1,2)\mathbf e(2)^T + & + \mathbf e(1)M(1,2)\mathbf f(2)^T + \\ + \mathbf f(1)M(1,2)\mathbf e(2)^T + & + \mathbf f(1)M(1,2)\mathbf f(2)^T + \end{bmatrix} + =\begin{bmatrix} + A & B \\ C & D + \end{bmatrix} +\end{equation} +Then the superdeterminant of $M$ is given by +\begin{equation} + \operatorname{sdet}M=\det(A-BD^{-1}C)\det(D)^{-1} +\end{equation} +which is the same for the normal equation for the determinant of a block matrix +save for the inverse of $\det D$. + \section{Complexity of dominant optima in the least-squares problem} \label{sec:dominant.complexity} |