%
%  Created by Jaron Kent-Dobias on Thu Apr 20 12:50:56 EDT 2017.
%  Copyright (c) 2017 Jaron Kent-Dobias. All rights reserved.
%
\documentclass[aps,prl,reprint]{revtex4-1}

\usepackage[utf8]{inputenc}
\usepackage{amsmath,amssymb,latexsym,mathtools,xifthen}
\usepackage{algcompatible}
\usepackage{algorithm}

% uncomment to label only equations that are referenced in the text
%\mathtoolsset{showonlyrefs=true}

% I want labels but don't want to type out ``equation''
\def\[{\begin{equation}}
\def\]{\end{equation}}

% math not built-in
\def\arcsinh{\mathop{\mathrm{arcsinh}}\nolimits}
\def\arccosh{\mathop{\mathrm{arccosh}}\nolimits}
\def\ei{\mathop{\mathrm{Ei}}\nolimits} % exponential integral Ei
\def\re{\mathop{\mathrm{Re}}\nolimits}
\def\im{\mathop{\mathrm{Im}}\nolimits}
\def\sgn{\mathop{\mathrm{sgn}}\nolimits}
\def\dd{d} % differential
\def\O{O}          % big O
\def\o{o}          % little O
\def\Z{\mathbb Z}
\def\R{\mathbb R}
\def\G{S}

% subscript for ``critical'' values, e.g., T_\c
\def\c{\mathrm c}

% scaling functions
\def\fM{\mathcal M}  % magnetization
\def\fX{\mathcal Y}  % susceptibility
\def\fF{\mathcal F}  % free energy
\def\fiF{\mathcal H} % imaginary free energy
\def\fS{\mathcal S}  % surface tension
\def\fG{\mathcal G}  % exponential factor

\def\H{\mathcal H}

% lattice types
\def\sq{\mathrm{sq}}
\def\tri{\mathrm{tri}}
\def\hex{\mathrm{hex}}

% dimensions
\def\dim{d}
\def\twodee{\textsc{2d} }
\def\threedee{\textsc{3d} }
\def\fourdee{\textsc{4d} }

% fancy partial derivative
\newcommand\pd[3][]{
  \ifthenelse{\isempty{#1}}
    {\def\tmp{}}
    {\def\tmp{^#1}}
  \frac{\partial\tmp#2}{\partial#3\tmp}
}

\newcommand\nn[2]{\langle#1#2\rangle}
\newcommand\avg[1]{\langle#1\rangle}
\newcommand\eavg[1]{\langle#1\rangle_{\mathrm e}}
\newcommand\mavg[1]{\langle#1\rangle_{\mathrm m}}
\def\e{\mathrm e}
\def\m{\mathrm m}
\newcommand\new[1]{\tilde{#1}}

\renewcommand\vec[1]{\mathbf{#1}}
\newcommand\unit[1]{\hat{\mathbf{#1}}}

% used to reformat display math to fit in two-column ``reprint'' mode
\makeatletter
\newif\ifreprint
\@ifclasswith{revtex4-1}{reprint}{\reprinttrue}{\reprintfalse}
\makeatother

\begin{document}

\title{Efficiently sampling Ising states in an external field}
\author{Jaron Kent-Dobias}
\author{James P.~Sethna}
\affiliation{Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY, USA}

\date\today

\begin{abstract}
  We introduce an extension of the Wolff algorithm that preforms efficiently
  in an external magnetic field. Near the Ising critical point, the
  correlation time of our algorithm has a conventional scaling form that
  reduces to that of the Wolff algorithm at zero field. As an application, we
  directly measure scaling functions of observables in the metastable state of
  the 2D Ising model.
\end{abstract}

\maketitle

The Ising model is a simple model of a magnet comprised of locally interacting
spins. Like most large thermal systems, computation of its properties cannot
be carried out explicitly and is preformed using Monte Carlo methods. Near its
continuous phase transition, divergent correlation length leads to divergent
correlation time in any locally-updating algorithm, hampering computation.
At zero external field, this was largely alleviated by cluster algorithms,
like the Wolff algorithm, whose dynamics are nonlocal and each step flips
groups of spins whose size diverges with the correlation length. However, the
Wolff algorithm only works at zero field. We describe an extension of this
algorithm that works in arbitrary external field while preserving the Wolff
algorithm's small dynamic exponent.

The Wolff algorithm works by first choosing a random spin and adding it to an
empty cluster. Every neighbor of that spin pointed in the same direction as
the spin is added to the cluster with probability $1-e^{-2\beta J}$, where
$\beta=1/T$ and $J$ is the coupling between sites. This process is iterated
again for neighbors of every spin added to the cluster. When all sites
surrounding the cluster have been exhausted, the cluster is flipped. Our
algorithm is a simple extension of this. An extra spin is introduced (often
referred to as a ``ghost spin'') that couples to all others with coupling $H$.
The traditional Wolff algorithm is then preformed on this larger lattice exactly as described above,
with the extra spin treated no differently from any others. Observables in the
original system can be exactly estimated on the new one using a simple
mapping. As an application, we use our algorithm to measure critical scaling functions
of the 2D Ising model in its metastable phase.

\section{Introduction}

Consider an undirected graph $G=(V,E)$ describing a system of interacting spins. The set
of vertices $V=\{1,\ldots,N\}$ enumerates the sites of the network, and the
set of edges $E$ describes connections between interacting sites. On each site
is a spin that can take any value from the set $S=\{-1,1\}$. The state of the
system is described by a function $s:V\to S$, leading to a configuration space
of all possible states $S^n=S\times\cdots\times S$. The Hamiltonian
$\H:S^n\to\R$ gives the energy of a particular state $s\in S^n$ and is defined
by
\[
  \H(s)=-\sum_{\{i,j\}\in E}J_{ij}s_is_j-HM(s),
  \label{eq:ham}
\]
where $H$ is the external magnetic field, $J:E\to\R$ gives the coupling between spins on connected sites and
$M:S^n\to\R$ is the magnetization of the system defined for a state
$s\in S^n$ by
\[
  M(s)=\sum_{i\in V}s_i.
\]
For the purpose of this study, we will only be considering ferromagnetic
systems where the function $J$ is nonnegative. All formal results can be
generalized to the antiferromagnetic or mixed cases, but algorithmic
efficiency for instance cannot.
An observable of this system is a function $A:S^n\to\R$ depending on the
system's state. Both the Hamiltonian and magnetization defined above are
observables. Assuming the ergodic hypothesis holds, the expected value $\avg
A$ of
any observable $A$ is defined by its average over every state $s$ in the
configuration space $S^n$ weighted by the Boltzmann factor $e^{-\beta\H(s)}$,
or
\[
  \avg A=\frac1Z\sum_{s\in S^n}e^{-\beta\H(s)}A(s),
  \label{eq:avg.obs}
\]
where $\beta$ is the inverse temperature and the partition function $Z$ defined by
\[
  Z=\sum_{s\in S^n}e^{-\beta\H(s)}
\]
gives the correct normalization for the weighted sum.

The sum over configurations in \eqref{eq:avg.obs} are intractable for all but
for very small systems. Therefore expected values of observables are usually
approximated by some means. Monte Carlo methods are a common way of
accomplishing this. These methods sample states $s$ in the configuration space
$S^n$ according to the Boltzmann distribution $e^{-\beta\H(s)}$, so that
averages of observables made using their samples asymptotically approach the
true expected value.

The Metropolis--Hastings algorithm
\cite{metropolis1953equation,hastings1970monte} is very popular for systems in statistical
physics. A starting state $s\in S^n$ is randomly perturbed to the state $s'$,
usually by flipping one spin. The change in energy $\Delta\H=\H(s')-\H(s)$ due
to the perturbation is then computed. If the change is negative the perturbed
state $s'$ is accepted as the new state. Otherwise the perturbed state
is accepted with probability $e^{-\beta\Delta\H}$. This process is repeated
indefinitely and a sample of states is made by sampling the state $s$
between iterations. This algorithm is shown schematically in Algorithm
\ref{alg:met-hast}. Metropolis--Hastings is very general, but unless the
perturbations are very carefully chosen the algorithm suffers in regimes where
large correlations are present in the system, for instance near continuous
phase transitions. Here the algorithm suffers from what is known as critical
slowing-down, where likely states consist of large correlated clusters that
take many perturbations to move between in configuration space.

\begin{figure}
  \begin{algorithm}[H]
    \begin{algorithmic}
      \REQUIRE $s\in S^n$
      \STATE $s'\gets$ \texttt{Perturb}($s$)
      \STATE $\Delta\H\gets\H(s')-\H(s)$
      \IF {$\exp(-\beta\Delta\H)>$ \texttt{UniformRandom}$(0,1)$}
        \STATE $s\gets s'$
      \ENDIF
    \end{algorithmic}
    \caption{Metropolis--Hastings}
    \label{alg:met-hast}
  \end{algorithm}
\end{figure}

The Wolff algorithm \cite{wolff1989collective} solves many of these problems, but only at zero external
field, $H=0$. This algorithm solves the problem of critical slowing-down by
flipping carefully-constructed clusters of spins at once in a way that samples
high-correlated states quickly while also always accepting prospective states.
A random site $i$ is selected from the graph and its spin is flipped. Each of the
site's neighbors $j$ is also flipped with probability $1-e^{-2\beta|J_{ij}|}$
if doing so would lower the energy of the bond $i$--$j$. The process is
repeated with every neighbor that was flipped. While this algorithm
successfully addresses the problems of critical slowing down at zero field, it
doesn't directly apply at nonzero field. A common extension is the hybrid
Wolff--Metropolis, where a Wolff cluster is formed using the ordinary formula,
then accepted or rejected Metropolis-style based on the change in energy
\emph{relative to the external field}, $H\Delta M$. Because each spin has the
same coupling to the external field, a strictly more efficient
but exactly equivalent version of this hybrid is made by distributing the
cluster acceptance or rejection across the spins in the cluster as they are
added. In this version, the cluster is abandoned with probability $1-e^{-\beta
H}$ every time a spin is added to it.

$z=0.29(1)$ \cite{wolff1989comparison,liu2014dynamic} $z=0.35(1)$ for
Swendsen--Wang \cite{swendsen1987nonuniversal}

\begin{figure}
  \begin{algorithm}[H]
    \begin{algorithmic}
      \REQUIRE $s\in\G^n$
      \STATE \textbf{let} $q$ be an empty stack
      \STATE \textbf{let} $i_0\in V$
      \STATE $q.\mathtt{push}(i_0)$
      \STATE $\sigma\gets s_{i_0}$
      \WHILE {$q$ is not empty}
        \STATE $i\gets q.\mathtt{pop}$
        \IF {$s_i=\sigma$}
          \FORALL {$j$ such that $\{i,j\}\in E$}
            \IF {$1-\exp(-2\beta
              J_{ij}s_is_j)>\mathop{\mathtt{UniformRandom}}(0,1)$}
          \STATE $q.\mathtt{push}(j)$
            \ENDIF
          \ENDFOR
          \STATE $s_i\gets-s_i$
        \ENDIF
      \ENDWHILE
    \end{algorithmic}
    \caption{Wolff (Zero-Field)}
    \label{alg:wolff}
  \end{algorithm}
\end{figure}


\begin{figure}
  \begin{algorithm}[H]
    \begin{algorithmic}
      \REQUIRE $s\in\G^n$
      \STATE $s'\gets s$
      \STATE \textbf{let} $q$ be an empty stack
      \STATE \textbf{let} $i_0\in V$
      \STATE $q.\mathtt{push}(i_0)$
      \STATE $\sigma\gets s_{i_0}'$
      \STATE \texttt{completed} $\gets$ \textbf{true}
      \WHILE {$q$ is not empty}
        \STATE $i\gets q.\mathtt{pop}$
        \IF {$s_i'=\sigma$}
          \FORALL {$j$ such that $\{i,j\}\in E$}
          \IF {$1-\exp(-2\beta
            J_{ij}s_i's_j')>\mathop{\mathtt{UniformRandom}}(0,1)$}
          \STATE $q.\mathtt{push}(j)$
            \ENDIF
          \ENDFOR
          \STATE $s_i'\gets-s_i'$
          \STATE $q.\mathtt{push}(i)$
          \IF {$1-\exp(-2\beta\sigma H)>\mathop{\mathtt{UniformRandom}}(0,1)$}
          \STATE \texttt{completed} $\gets$ \textbf{false}
            \STATE \textbf{break}
          \ENDIF
        \ENDIF
      \ENDWHILE
      \IF {completed}
        $s\gets s'$
      \ENDIF
    \end{algorithmic}
    \caption{Hybrid Wolff/Metropolis--Hastings}
    \label{alg:h.wolff}
  \end{algorithm}
\end{figure}

\section{Cluster-flip in a Field}

Consider the new graph $\tilde G=(\tilde V,\tilde E)$ defined from $G$ by
\begin{align}
  \tilde V&=0\cup V\\
  \tilde E&=E\cup\{\{0,i\}\mid i\in V\},
\end{align}
or by adding a zeroth vertex and edges from every other vertex to the new
one. The network of spins
described by this graph now have states that occupy a configuration space
$S^{n+1}$. Extend the coupling between spins $\tilde J:\tilde E\to\R$ by
\[
  \tilde J_{ij}=\begin{cases}
    J_{ij} & i,j>0\\
    H & \text{otherwise},
  \end{cases}
\]
so that each spin is coupled to every other spin the same way they were
before, and coupled to the new spin with strength $H$. Now define a
Hamiltonian function 
$\tilde\H:S^{n+1}\to\R$ on this new, larger configuration space defined by
\[
  \tilde\H(s)=-\sum_{\{i,j\}\in\tilde E}\tilde J_{ij}s_is_j.
\]
This new Hamiltonian resembles the old one, but is comprised only of
spin--spin interactions with no external field. However, by changing the terms
considered in the sum we may write
\[
  \tilde\H(s)=-\sum_{\{i,j\}\in E}J_{ij}s_is_j-H\tilde M(s)
\]
where the new magnetization $\tilde M:S^{n+1}\to\R$ is defined for $(s_0,s)\in
S^{n+1}$ by
\[
  \tilde M(s)=s_0\sum_{i\in V}s_i=M(s_0\times(s_1,\ldots,s_n)).
\]
In fact, any observable $A$ of the original system can be written as an
observable $\tilde A$ of the new system by defining
\[
  \tilde A(s)=A(s_0\times(s_1,\ldots,s_n))
\]
and the expected value of the observable $A$ in the old system and that of its
counterpart $\tilde A$ in the new system is unchanged. This can be seen by
using the facts that
$\tilde\H(s)=\tilde\H(-s)$, $\sum_{s\in S^n}f(s)=\sum_{s\in S^n}F(-s)$ for any
$f:S^n\to\R$, and $\tilde\H(1,s)=\H(s)$ for $s\in S^n$, from which follows
that
\[
  \begin{aligned}
  \tilde Z\avg{\tilde A}
  &=\sum_{s\in S^{n+1}}\tilde A(s)e^{-\beta\tilde\H(s)}
  =\sum_{s_0\in S}\sum_{s\in S^n}\tilde A(s_0,s)e^{-\beta\tilde\H(s_0,s)}\\
  &=\bigg(\sum_{s\in S^n}A(s)e^{-\beta\tilde\H(1,s)}
    +\sum_{s\in S^n}A(-s)e^{-\beta\tilde\H(-1,s)}\bigg)\\
  &=\bigg(\sum_{s\in S^n}A(s)e^{-\beta\tilde\H(1,s)}
    +\sum_{s\in S^n}A(-s)e^{-\beta\tilde\H(1,-s)}\bigg)\\
  &=\bigg(\sum_{s\in S^n}A(s)e^{-\beta\tilde\H(1,s)}
    +\sum_{s\in S^n}A(s)e^{-\beta\tilde\H(1,s)}\bigg)\\
  &=\bigg(\sum_{s\in S^n}A(s)e^{-\beta\H(s)}
    +\sum_{s\in S^n}A(s)e^{-\beta\H(s)}\bigg)\\
    &=2Z\avg A.
\end{aligned}
\]
An identical calculation shows $\tilde Z=2Z$, therefore immediately proving
$\avg{\tilde A}=\avg A$. Notice this correspondence also holds for the
Hamiltonian.

Our new spin system with an additional field is, when $H$ is greater than
zero, simply a ferromagnetic spin system in the absence of an external field.
Therefore, the Wolff algorithm can be applied to it with absolutely no
modifications. Since there is an exact correspondence between the expectation
values of our ordinary spin system in a field and their appropriately defined
counterparts in our new system, sampling Monte Carlo simulations of our new
system allows us to estimate the expectation values for the old. This ``new''
algorithm, if you can call it that, is shown in Algorithm
\ref{alg:wolff-field}.

\begin{figure}
  \begin{algorithm}[H]
    \begin{algorithmic}
      \REQUIRE $s\in S^n$
      \STATE \textbf{let} $q$ be an empty stack
      \STATE \textbf{let} $i_0\in V$
      \STATE $q.\mathtt{push}(i_0)$
      \STATE $\sigma\gets s_{i_0}$
      \WHILE {$q$ is not empty}
        \STATE $i\gets q.\mathtt{pop}$
        \IF {$s_i=\sigma$}
          \FORALL {$j$ such that $\{i,j\}\in\tilde E$}
            \IF {$1-\exp(-2\beta \tilde
              J_{ij}s_is_j)>\mathop{\mathtt{UniformRandom}}(0,1)$}
          \STATE $q.\mathtt{push}(j)$
            \ENDIF
          \ENDFOR
          \STATE $s_i\gets-s_i$
        \ENDIF
      \ENDWHILE
    \end{algorithmic}
    \caption{Wolff (Nonzero Field)}
    \label{alg:wolff-field}
  \end{algorithm}
\end{figure}


At sufficiently small $H$ both our algorithm and the hybrid Wolff--Metropolis reduce
to the ordinary Wolff algorithm, and have its runtime properties. At very large $H$, the hybrid
Wolff--Metropolis behaves exactly like Metropolis, where almost only one spin
is ever flipped at a time with probability $\sim e^{-2\beta H}$, since the
energy is dominated by contributions from the field.

We measured the autocorrelation time $\tau$ of the internal energy $\H$ of a
square-lattice Ising model ($J=1$ and $E$ is the set of nearest neighbor
pairs)
resulting from using each of these three algorithms at various
fields and temperatures. This was done using a batch mean estimator. Time was
measured as ``the number of spins that the algorithm has attempted to flip.''
For example, every Metropolis--Hastings step takes unit time, every Wolff step takes
time equal to the number of spins in the flipping cluster, and every hybrid
Wolff/Metropolis--Hastings step takes time equal to the number of spins in the
cluster the step attempted to flip, whether it succeeds or not. The
autocorrelation time as a function of reduced field $\beta H$ is shown in
Fig.~\ref{fig:correlation}. By this measure of time, the modified Wolff
algorithm thermalizes faster than both the hybrid Wolff/Metropolis and
ordinary Metropolis--Hastings in all regimes. At high field ($\beta H\sim1$)
the new algorithm has an autocorrelation time of roughly $n$, while both
Metropolis--Hastings and the hybrid algorithm have an autocorrelation time of
$2n$. At low temperature, neither the new algorithm nor Metropolis-Hastings
have an autocorrelation time that depends on field, but the hybrid method
performs very poorly at intermediate values of field in a way that becomes
worse with system size, scaling worse than the hybrid algorithm does even it
the critical temperature. At high temperature, all three algorithms perform
similarly and in a way that is independent of system size. At the critical
temperature, both the Metropolis--Hastings and hybrid algorithms experience
critical slowing down in a similar way, whereas the new algorithm slows down
in a far reduced way, always performing more efficiently at nonzero field than
the zero-field Wolff algorithm.

\begin{figure}
  \centering
  \input{fig_correlation}
  \caption{The autocorrelation time $\tau$ of the internal energy $\H$ for a $n=32\times32$ square-lattice Ising
  model simulated by the three nonzero field methods detailed in this paper.
  The top plot is for simulations at high temperature, $T=5.80225$, the middle
plot is for simulations at the critical temperature, $T=2.26919$, and the
bottom plot is for simulations at }
  \label{fig:correlation}
\end{figure}

\begin{figure}
  \centering
  \input{fig_correlation_collapse-hL}
\end{figure}

\section{Magnetization Estimator}

At any size, the ensemble average magnetization $\avg M$ is identically zero
at zero field. However, this is not what is observed at low temperature. This
is because in the low temperature phase the ergodic hypothesis---that the
time-average value of observables is equal to their ensemble average---is
violated. As the system size grows the likelihood of a fluctuation in any
reasonable dynamics that flips the magnetization from one direction to the
other becomes vanishingly small, and therefore it is inappropriate to estimate
expected values in the low temperature phase by averaging over the whole
configuration space. Instead, values must be estimated by averaging over the
portion of configuration space that is accessible to the dynamics.

For finite size systems, like any we would simulate, dynamics at zero field or
even small nonzero field do allow the whole configuration space to be
explored. However, people usually want to use the results from finite size
systems to estimate the expected values in the thermodynamic limit, where this
is no longer true. At zero field, for instance, it is common practice to use
$\avg{|M|}$ to estimate the expected value for the magnetization instead of
$\avg M$. But what to do at finite field? Is this approach justified?

Since, in the thermodynamic limit expected values are given by an average
over a restricted configuration space, we can estimate those expected values
at finite size by making the same restriction. Defining the reduced
configuration spaces
\begin{align}
  S_\e^n&=\{s\in S^n\mid \sgn(H)M(s)>0\}
  \\
  S_\m^n&=\{s\in S^n\mid \sgn(H)M(s)<0\}
  \\
  S_0^n&=\{s\in S^n\mid \sgn(H)M(s)=0\}
\end{align}
where
\[
  \sgn(H)=\begin{cases}1&H\geq0\\-1&H<0.\end{cases}
\]
Clearly, $S^n=S^n_\e\cup S^n_\m\cup S^n_0$, which implies that
$Z=Z_\e+Z_\m+Z_0$. Since $S^n_\e=\{-s\mid s\in S^n_\m\}$, $|S^n_\e|=|S^n_\m|$.
At zero field, $\H(s)=\H(-s)$, and therefore $Z_\e=Z_\m$.  This can be used to
show the near-equivalence (at zero field) of taking expectation values
$\avg{|M|}$ of the absolute value of the magnetization and taking expectation
values $\avg M_\e$ of the magnetization on a reduced configuration space,
since
\begin{align}
  \avg{|M|}
  &=\frac1Z\sum_{s\in S^n}e^{-\beta\H(s)}|M(s)|\\
  &=\frac1{Z_\e+Z_\m+Z_0}\bigg(\sum_{s\in S^n_\e}e^{-\beta\H(s)}|M(s)|+
  \sum_{s\in S^n_\m}e^{-\beta\H(s)}|M(s)|+\sum_{s\in
  S^n_0}e^{-\beta\H(s)}|M(s)|\bigg)\\
  &=\frac1{2Z_\e+Z_0}\bigg(\sum_{s\in S^n_\e}e^{-\beta\H(s)}M(s)+
  \sum_{s\in S^n_\e}e^{-\beta\H(-s)}|M(-s)|\bigg)\\
  &=\frac2{2Z_\e+Z_0}\sum_{s\in S^n_\e}e^{-\beta\H(s)}M(s)\\
  &=\frac1{1+\frac{Z_0}{2Z_\e}}\eavg M
\end{align}
At infinite temperature, $Z_0/Z_\e\simeq n^{-1/2}\sim L^{-1}$ for large $L$,
$N$. At any finite temperature, especially in the ferromagnetic phase,
$Z_0\ll Z_\e$ in a much more extreme way.

If the ensemble average over only positive magnetizations can be said to
converge to the equilibrium state of the Ising model in the thermodynamic
limit, what of the average over only \emph{negative} magnetizations, or the
space $S^n_\m$,
ordinarily unphysical?  This is, in one sense, precisely the definition of the
metastable state of the Ising model. Expected values of observables in the
metastable state can therefore be computed by considering this reduced
ensemble.

How, in practice, are these samplings over reduced ensembles preformed? There
may be efficient algorithms for doing Monte Carlo sampling inside a particular
reduced ensemble, but for finite-size systems with any of the algorithms
described here the whole configuration space is available. Any of these can be
used, however, to simultaneously sample all reduced configuration spaces by
allowing them to sample the whole space and then only add a contribution to
a given average if the associated state is in the reduced space of interest.

\begin{acknowledgments}
  Thanks!
\end{acknowledgments}

\bibliography{monte-carlo}

\end{document}