Jekyll2021-09-24T13:07:55+02:00http://homotopico.com/feed.xmlHomotopicoLoosely-threaded thoughts [mostly] about mathematics and physics.
Santiago Quintero de los Ríossquinterodlr@gmail.comGeometric Quantization is Slowly Driving Me Insane2021-09-24T00:00:00+02:002021-09-24T00:00:00+02:00http://homotopico.com/2021/09/24/geometric-quantization<p>Get this post on <a href="/assets/docs/pdf_posts/geometric-quantization.pdf">pdf here</a>.</p>
<p>Quantization is the process that transforms a classical physical system
into its quantum counterpart. As done by most physicists, quantization
is quite <em>ad hoc</em> and not very rigorous. There’s a well-defined
mathematical structure of classical mechanics (which is generalized into
symplectic geometry), and there’s a well-defined structure of
non-relativistic quantum mechanics (that of Hermitian operators on a
Hilbert space); however, how one takes a classical object (be it state
or observable) into its quantum counterpart is often not very clear.</p>
<p align="middle">
<img src="/assets/posts/geometric-quantization/geometric-quantization.png" width="80%" />
</p>
<p>In this post, we start a dive into <em>one</em> of the few ways mathematicians
have tried to make quantization rigorous: <strong>geometric quantization</strong>.
Its name comes from the fact that the quantization method arises
somewhat naturally from the symplectic structure of phase space.</p>
<p>Before we look at geometric quantization, let’s look at quantization of
classical phase space, as done by physicists. That way we can see what
we want from a theory of quantization.</p>
<p>Consider the <em>phase space</em> of a particle, for simplicity in one spatial
dimension. It has a position coordinate $q$ and a momentum coordinate
$p$, and their Poisson bracket<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> is</p>
\[\left\{q,p\right\} = \frac{\partial q}{\partial q}\frac{\partial p}{\partial p} - \frac{\partial p}{\partial q}\frac{\partial q}{\partial p} = 1.\]
<p><strong>Canonical quantization</strong> (or Heisenberg quantization) consists of
turning $q$ and $p$ into Hermitian <em>operators</em> $\hat{q},\hat{p}$ on a
Hilbert space $\mathcal{H}$, which satisfy the <em>canonical commutation
relations</em></p>
\[[\widehat{q},\widehat{p}] = i\hslash.\]
<p>So in brief, in physics we quantize by
putting little hats on top of observables, declaring that they are
Hermitian operators now, and imposing the canonical commutation
relations.</p>
<p>More generally, we want to promote all functions $f(q,p)$ on phase space
to Hermitian <em>operators</em> on some Hilbert space $\mathcal{H}$, in a way
that respects the Poisson bracket. That is, for two functions $f$, $g$,
we want their quantizations $\hat{f}$, $\hat{g}$ to satisfy</p>
\[\left\{f,g\right\} \mapsto \frac{-i}{\hslash}[\hat{f},\hat{g}].\]
<p>How do we obtain the quantization of such a function? Well, starting from
the quantization of $q$ and $p$, we can quantize “all” functions that
depend on only one of the variables $q$ or $p$, by “quantizing” each
term of their Taylor expansion. For example, if we have a function
$f(q)$, then its quantization is</p>
\[f(q) = \sum_{n=0}^\infty a_nq^n \mapsto \hat{f} = \sum_{n=0}^\infty a_n\hat{q}^n.\]
<p>In practice, quantizing all functions of only $p$ and only $q$ gets us
quite far, since we are mostly interested in the Hamiltonian function,
which is very often of the form</p>
\[H(q,p) = \frac{p^2}{2m} + V(q),\]
<p>where $V$ is a potential function that only depends on position. But
what if the function we want to quantize depends both on $q$ and $p$ in
a nontrivial way? Well... things get hairy then and there’s several,
often non-equivalent, ways to deal with that.</p>
<p>Another question we have right now is... what is the Hilbert space
$\mathcal{H}$? Here is where we connect with Schrödinger’s wave
mechanics. The Hilbert space is $L^2(\mathbb{R})$, the space of
square-integrable functions on the real line, and the operators
$\hat{q}$, $\hat{p}$ have an explicit representation:</p>
\[\begin{aligned}
(\hat{q}\psi)(x) &= x\psi(x)\\
(\hat{p}\psi)(x) &= -i\hslash\frac{\mathrm{d}\psi}{\mathrm{d}x}.\end{aligned}\]
<p>Following all these prescriptions, we have, for example, that the
Hamiltonian of a free particle gets quantized as</p>
\[\hat{H} = -\frac{\hslash^2}{2m}\frac{\mathrm{d}^2}{\mathrm{d}x^2}.\]
<p>And the Hamiltonian of the harmonic oscillator with elastic constant $k$
gets quantized as</p>
\[\hat{H} = -\frac{\hslash^2}{2m}\frac{\mathrm{d}^2}{\mathrm{d}x^2} + \frac{1}{2}kx^2,\]
<p>where the second term is understood as <em>multiplying</em> by $x^2$.</p>
<h1 id="our-wildest-quantum-dreams">Our wildest quantum dreams</h1>
<p>Looking back on the previous section, we can try to fit all these into
some <em>axioms</em> of quantization. What do we want from a <strong>quantization
scheme</strong>?</p>
<p>We want a procedure that takes <em>functions</em> on phase space, and turns
them into Hermitian operators on a Hilbert space. If we consider the
phase space in $n$ dimensions, what we want then is a <em>map</em>
$C^\infty(\mathbb{R}^n\times\mathbb{R}^n)\to \operatorname{Herm}(\mathcal{H})$,
which we denote as $f\mapsto \hat{f}$. The first obvious condition is
that it should turn Poisson brackets into commutation relations (times
$i\hslash$). We can express this as:</p>
<p><strong>Axiom 1:</strong> For any pair of functions $f$,
$g\in C^\infty(\mathbb{R}^n\times\mathbb{R}^n)$, we have the <em>canonical
commutation relations</em></p>
\[[\widehat{f},\widehat{g}] =i\hslash\widehat{\left\{f,g\right\}}.\]
<p>This
axiom is not enough to give us the canonical commutation relations of
$p$ and $q$: we need to ensure that the quantization of the constant
function $1$ is precisely the identity $I$. We introduce this as a
second axiom:</p>
<p><strong>Axiom 2:</strong> The quantization of the constant function $1$ is the
identity $I$:</p>
\[\hat{1} = I.\]
<p>Another axiom that is well needed is that the quantization scheme should
be linear. This is an implicit assumption when we do canonical
quantization!</p>
<p><strong>Axiom 3:</strong> The quantization scheme is linear.</p>
<p>The next axiom is the one that tells us that we can quantize functions
by quantizing their Taylor expansions in a term by term basis. In most
cases, we don’t even need to go that far: we just want the quantization
scheme to respect the polynomial algebra generated by functions! That
is:</p>
<p><strong>Axiom 4</strong>: For any function $f$ and all $n\in\mathbb{Z}$:</p>
\[\widehat{(f^n)} = (\widehat{f})^n.\]
<p>The last axiom, doesn’t show up explicitly in canonical quantization,
but it implies that the quantization of the functions $p$ and $q$ are
the standard ones on $L^2(\mathbb{R})$. This is a condition imposing the
irreducibility of the representation of the algebra of functions on the
Hilbert space:</p>
<p><strong>Axiom 5</strong>: The only subspaces $W\subseteq \mathcal{H}$ that are stable
under the action of all the quantizations of the position and momentum
functions are $0$ and $\mathcal{H}$. That is to stay: if
$\hat{q}(W)\subseteq W$ and $\hat{p}(W)\subseteq W$, then $W=0$ or
$W=\mathcal{H}$.</p>
<p>By the Stone-Von Neumann theorem, this, along with the canonical
commutation relations of $\hat{q}$ and $\hat{p}$, implies their standard
representations in $L^2(\mathbb{R})$.</p>
<h1 id="our-dreams-shattered">Our dreams shattered</h1>
<p>Welp, it turns out that the axioms above are too much to ask of a
quantization scheme. As an example, let’s consider the function $pq$.
Remember that we don’t know how to quantize products of $p$ and $q$, but
if we rewrite it as</p>
\[pq = \frac{1}{2}((p+q)^2-p^2-q^2),\]
<p>we know how
to quantize all the terms on the right-hand side, using axioms 2 and 4:</p>
\[\widehat{pq} = \frac{1}{2}((\hat{p}+\hat{q})^2 - \hat{p}^2-\hat{q}^2) = \frac{1}{2}(\hat{p}\hat{q}+\hat{q}\hat{p}).\]
<p>So far, so good. In a similar fashion, we get</p>
\[\widehat{p^2q^2} = \frac{1}{2}(\hat{p}^2\hat{q}^2+\hat{q}^2\hat{p}^2).\]
<p>But here we run into trouble. From axiom 4, we should have</p>
\[\widehat{p^2q^2} = \widehat{(pq)^2} = \widehat{pq}^2,\]
<p>but</p>
\[\widehat{pq}^2\neq \frac{1}{2}(\hat{p}^2\hat{q}^2+\hat{q}^2\hat{p}^2).\]
<p>So the scheme is inconsistent! And it turns out (see Ali & Engliš) that
many more inconsistencies between these axioms.</p>
<p>So... why isn’t this a big problem in physics? It’s because we <em>don’t
care</em> about the whole Poisson algebra of $C^\infty(\mathbb{R}^{n}\times\mathbb{R}^n)$! In practice, we only care
about quantizing certain specific functions on phase space: energies,
momenta, etc. These have explicit expressions which are often polynomials of low degree in $p$ and $q$, without any cross-terms like
$pq$, which are the ones that bring trouble. Even more, if we have
cross-terms as in the angular momentum</p>
\[\mathbf{L} = \mathbf{q}\times\mathbf{p} = (q_yp_z-q_zp_y)\mathbf{e}_x + (q_zp_x-q_xp_z)\mathbf{e}_y + (q_xp_y-q_yp_x)\mathbf{e}_z,\]
<p>the factors $q_ip_j$ are actually commuting, since the canonical Poisson
algebra is</p>
\[\left\{q_i,p_j\right\} = \delta_{ij}.\]
<p>So in the end thereis no trouble.</p>
<h1 id="what-is-our-objective-then">What is our objective, then?</h1>
<p>We know that it’s impossible to find a quantization scheme satisfying
all the axioms, so what is our goal? At the outset, we need to relax
some conditions.</p>
<p>First, we will not try to quantize the entire Poisson algebra of smooth
functions on phase space. Instead, we will focus on a (possibly very
small) subalgebra of <em>quantizable observables</em>, which we call
$\mathrm{Obs}\subset C^\infty(\mathbb{R}^n\times\mathbb{R}^n)$.
Furthermore, we will <em>not</em> require Axioms 4 and 5 anymore, which seem to
be the most restrictive.</p>
<p>And even better, we don’t have to focus on the standard phase space
$\mathbb{R}^n\times \mathbb{R}^n$, but instead we can try to find a
quantization scheme for any symplectic manifold.</p>
<h1 id="the-geometric-viewpoint-of-canonical-quantization">The geometric viewpoint of canonical quantization</h1>
<p>Let’s put our geometer hats on and go back to the canonical quantization
of $M=\mathbb{R}^n\times\mathbb{R}^n$. We’ll do some half guessing, half
reverse-engineering to try to find:</p>
<ol>
<li>
<p>A subalgebra of quantizable observables $\mathrm{Obs}$,</p>
</li>
<li>
<p>a Hilbert space of quantum states $\mathcal{H}$,</p>
</li>
<li>
<p>and a quantization scheme
$\mathrm{Obs}\to\operatorname{Herm}(\mathcal{H})$ satisfying axioms
1, 2, and 3,</p>
</li>
</ol>
<p>which hopefully coincides with the standard canonical quantization.
Along the way, we’ll also be thinking about how to write these ideas in
a coordinate-free way, so that we can generalize to symplectic
manifolds.</p>
<p>Our guiding light is the first axiom: For any pair of functions $f$, $g$
in our (still undetermined) space of observable functions, we want their
quantizations to satisfy the commutation relations</p>
\[[\widehat{f},\widehat{g}]= i\hslash\widehat{\left\{f,g\right\}}.\]
<p>Even more, we expect the
quantized operators to act somewhat like differential operators on a
space of functions (<em>wave</em>functions). This rings a symplectic bell!
Recall that any smooth function on a symplectic manifold $(M,\omega)$
has an associated Hamiltonian vector field $X_f$ defined as</p>
\[\iota_{X_f}\omega = \mathrm{d}{f}.\]
<p>These vector fields are indeed operators acting on functions, and furthermore, they satisfy</p>
\[X_{\left\{f,g\right\}} = -[X_f,X_g],\]
<p>Where $\{f,g\} = X_g[f]$ is the standard Poisson structure induced
by the symplectic structure. This tells us that a naïve, but good, start
for a quantization rule is</p>
\[\hat{f} = -i\hslash X_f,\]
<p>since then we have</p>
\[[\widehat{f},\widehat{g}]= -\hslash^2[X_f,X_g]=\hslash^2X_{\left\{f,g\right\}} = i\hslash(-i\hslash X_{\left\{f,g\right\}}) = i\hslash\widehat{\left\{f,g\right\}}.\]
<p>So axiom $1$ is satisfied, but not axiom $2$, since the Hamiltonian
vector field associated to a constant function is just $0$, so we need
to tweak the quantization scheme. The first dumb guess is to add a
multiplication by the original function, i.e.</p>
\[\hat{f} \overset{?}{=} -i\hslash X_f + f,\]
<p>so we fix the problem that the quantization of $1$ is not $\mathrm{id}$. However, as we see in the
gory details below, we break the commutation rules:</p>
\[[\widehat{f},\widehat{g}]= i\hslash\left(\widehat{\left\{f,g\right\}} + \left\{f,g\right\}\right)\]
<p>One way to fix this is by choosing a <em>symplectic potential</em> $\theta$ for
the symplectic form $\omega$; i.e. a form that satisfies
$-\mathrm{d}\theta = \omega$, and defining the quantization of an
observable as</p>
\[\hat{f} = -i\hslash X_f + f - \theta(X_f).\]
<p>This
tweaking goes back to Segal, 1960, but it is left unmotivated. It is
very likely that it was obtained by educated guessing and
trial-and-error. We show that it is a good quantization scheme in the
gory details below.</p>
<p>How does this look like in $\mathbb{R}^{n}\times\mathbb{R}^n$? If we
choose global coordinates $q^\mu$, $p_\mu$, the symplectic form is</p>
\[\omega = \mathrm{d}{q}^\mu\wedge\mathrm{d}{p_\mu},\]
<p>and the
Hamiltonian vector field associated to a function
$f\in C^\infty(\mathbb{R}^n\times\mathbb{R}^n)$ is</p>
\[X_f = \frac{\partial f}{\partial p_\mu}\frac{\partial }{\partial q^\mu} - \frac{\partial f}{\partial q^\mu}\frac{\partial }{\partial p_\mu},\]
<p>so that the Poisson bracket of two functions is</p>
\[\left\{f,g\right\} = \frac{\partial f}{\partial q^\mu}\frac{\partial g}{\partial p_\mu} - \frac{\partial f}{\partial p_\mu}\frac{\partial g}{\partial q^\mu}.\]
<p>A symplectic potential is given by the <em>tautological</em> form, which is
defined in these coordinates as<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> \(\theta = p_\mu\mathrm{d}{q^\mu}.\)
It is straightforward to show that this quantization scheme is
explicitly</p>
\[\hat{f} = -i\hslash\left(\frac{\partial f}{\partial p_\mu}\frac{\partial }{\partial q^\mu} - \frac{\partial f}{\partial q^\mu}\frac{\partial }{\partial p_\mu} \right) + f -p_\mu\frac{\partial f}{\partial p_\mu}.\]
<p>Alright! So we’re done! Let’s see what happens to the coordinate
functions. For the momenta $p_\mu$, we have</p>
\[\hat{p}_\mu = -i\hslash\frac{\partial }{\partial q^\mu}.\]
<p>Perfect!
And for the positions $q^\mu$:</p>
\[\hat{q}^\mu = i\hslash\frac{\partial }{\partial p_\mu} +q^\mu.\]
<p>Ah fuck. It’s okay, we can fix it. In fact, there is <em>nothing</em> wrong here!
You see, these are operators that have to act on some vector space. In
canonical quantization, this is the space of complex-valued,
square-integrable wavefunctions $L^2(\mathbb{R}^n)$, which consists of
functions only of the $q$ variables! So the leftover term in
$\hat{q}^\mu$ is actually zero, since the wavefunctions do not depend on
$p$.</p>
<p>So our first attempt at a rigorous <em>canonical quantization</em> is the
following:</p>
<ol>
<li>
<p>The symplectic manifold is $\mathbb{R}^n\times\mathbb{R}^n$, with
the standard symplectic structure.</p>
</li>
<li>
<p>The quantum state space is $L^2(\mathbb{R}^n)$, defined in the $q$
variables only.</p>
</li>
<li>
<p>The quantization of an observable
$f\in\mathbb{C}^{\infty}(\mathbb{R}^n\times\mathbb{R}^n)$ is given
by</p>
\[\hat{f} = -i\hslash X_f + f - \theta(X_f),\]
<p>where $\theta$ is the tautological form.</p>
</li>
</ol>
<p>Of course, there are a few problems there if we try to go to general
symplectic manifolds. First of all, there might not be a global choice
of a symplectic potential, so the quantization of an observable might
change as we move around the manifold. In fact, even locally the choice
of a potential is not unique. No bueno. Furthermore, in a general
manifold there isn’t a canonical separation of position and momenta: we
just have an even amount of coordinates. So how do we make a
coordinate-free definition of a function depending “only on the $q$
coordinates”?</p>
<p>Let’s tackle the first problem.</p>
<h1 id="gauge-symmetries">Gauge symmetries</h1>
<p>The quantization of observables requires a choice of a symplectic
potential. In the case of canonical quantization, there is a canonical
choice given by the tautological form.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> But in the general case of
symplectic manifolds,<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> symplectic potentials exist at best locally,
and the choice is far from unique. Can we get a quantization of
observables that is independent from the choice of symplectic potential?</p>
<p>Suppose that we have two symplectic potentials $\theta$ and $\theta’$,
defined locally on some open sets $U$ and $U’$, such that on the
intersection $U\cap U’$ we have</p>
\[\theta' = \theta + \mathrm{d}{u}.\]
<p>This, of course, changes the quantization of an observable:</p>
\[\hat{f}'= -i\hslash X_f + f - \theta'(X_f) = \hat{f} - \mathrm{d}{u}(X_f).\]
<p>Consequently, the action of $\hat{f}’$ on a wavefunction $\psi$ will
differ from that of $\hat{f}$. But all is not lost! In quantum
mechanics, we don’t really care about wavefunctions, but instead the
expectation values of observables:</p>
\[\left\langle\psi\middle\vert\hat{O}\psi\right\rangle.\]
<p>That means that if we change the observables <em>and</em> the wavefunctions such that the
expectation values are preserved, then all is well!</p>
<p>I’ve discussed this idea previously in another post, but the gist is the
following. If we change the wavefunction as</p>
\[\psi' = e^{\frac{i}{\hslash}u}\psi,\]
<p>then it is straightforward to
check that the action of $\hat{f}’$ on $\psi’$ satisfies</p>
\[\hat{f}'\psi' = (\hat{f}\psi)'.\]
<p>That is, if we transform the
operator and apply it to the transformed wavefunction, we get the same
thing as applying the original operator to the original wavefunction and
<em>then</em> transforming. Furthermore, we see that the operators change as</p>
\[\hat{f}' = e^{\frac{i}{\hslash}u}\hat{f}e^{-\frac{i}{\hslash}u},\]
<p>and
so the expectation values are preserved exactly:</p>
\[\left\langle\psi'\middle\vert\hat{f}'\psi'\right\rangle = \left\langle\psi\middle\vert\hat{f}\psi\right\rangle.\]
<p>This suggests that we should think of the wavefunction as a section of a
<em>line bundle</em> $L$ that has transition functions
$e^{-\frac{i}{\hslash}}$, and that the operators act on this space of
sections. Furthermore, if we write the action of $\hat{f}$ as</p>
\[\hat{f}\psi = -i\hslash\left(\mathrm{d}{\psi} -\frac{i}{\hslash}\theta\cdot\psi\right)(X_f) + f,\]
<p>we see that the term in brackets is precisely the local expression of a
connection (or covariant derivative) on $L$:</p>
\[\nabla = \mathrm{d}-\frac{i}{\hslash}\theta,\]
<p>whose curvature is
precisely</p>
\[R^\nabla = -\frac{i}{\hslash}\mathrm{d}{\theta} = \frac{i}{\hslash}\omega.\]
<p>We call the bundle $L$ the <strong>prequantum line bundle</strong>, and from its
curvature we can see that its Chern class is</p>
\[c_1(L) = \frac{1}{2\pi\hslash}[\omega].\]
<p>With this in mind, we can think of the state space as the set of
<em>sections</em> of the prequantum line bundle, and the quantization of a
function $f\in C^\infty(M)$ is</p>
\[\hat{f}\psi = -i\hslash\nabla_{X_f}\psi + f\psi.\]
<h1 id="the-first-obstruction">The first obstruction</h1>
<p>It turns out that having a line bundle $L$ with a connection whose
curvature is $\frac{i}{\hslash}\omega$ imposes a <em>restriction</em> on the
symplectic form. Namely, we require $\frac{1}{2\pi\hslash}[\omega]$ to
be an <em>integral</em> cohomology class.</p>
<p>Let’s see this briefly. Consider a cover $\{U_j\}$ of $M$,
where each $U_j$ is contractible and such that on each $U_j$ we have
symplectic potentials $\theta_j$. Then on intersections, we can define
the transition functions $u_{ij}:U_i\cap U_j\to \mathbb{C}$ satisfying</p>
\[\mathrm{d}u_{ij} = \theta_i - \theta_j.\]
<p>As we saw above, the
transition functions of the prequantum line bundle $L$ are of the form
$\exp(\frac{i}{\hslash}u_{ij})$, and so they must satisfy the cocycle
conditions</p>
\[\exp(\frac{i}{\hslash}(u_{ij}+u_{jk}+u_{ki}))=1,\]
<p>which
means that the functions $z_{ijk}:U_i\cap U_j\cap U_k\to \mathbb{C}$
defined as</p>
\[z_{ijk} = \frac{1}{2\pi\hslash}(u_{ij}+u_{jk}+u_{ki})\]
<p>are
integer-valued, and so must be constant. The collection of these $z$
functions forms a Čech 2-cocycle, and its cohomology class
$[z]\in\check{H}^2(M,\mathbb{Z})$ agrees precisely with the class
$\frac{1}{2\pi\hslash}[\omega]$.<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup></p>
<p>This <em>integrality condition</em> is satisfied trivially in the case of
$\mathbb{R}^{n}\times \mathbb{R}^{n}$, since the symplectic form is
exact and thus its cohomology class is zero. In fact, in the case of any
cotangent bundle, the canonical symplectic form is exact and so its
cohomology class is zero. This is good news! This tells us that we can
at least <em>attempt</em> to quantize classical systems.</p>
<p>We say that a symplectic manifold that satisfies the integrality
condition is <strong>prequantizable</strong>.</p>
<h1 id="reducing-the-number-of-variables-polarizations">Reducing the number of variables: Polarizations</h1>
<p>Now that we’ve dealt with the problem that the quantization of a
function depended on the choice of a symplectic potential, we move on to
discuss the other problem: How do we generalize the idea of a function
depending only on “half the number of variables” to a symplectic
manifold, where the choice of (Darboux) coordinates is not canonical?</p>
<p>Let’s look at the specific example of $\mathbb{R}^n\times \mathbb{R}^n$
and let’s try to see how to take it to a coordinate-free context. We
said that the <em>true</em> set of wavefunctions was $L^2(\mathbb{R}^n)$,
functions that depend only on the position coordinates $q^\mu$ but not
on the momenta $p_\mu$.</p>
<p>In a symplectic manifold, we don’t have a consistent choice of every
$q^\mu$ and $p_\mu$ globally. However, we can relax this and simply ask
a splitting of the local coordinates into $q$’s and $p$’s in such a way
that they don’t get mixed together as we move around the manifold. Maybe
when we change charts, the $q$’s get mixed among themselves, maybe the
$p$’s get mixed among themselves, but we don’t have $q$’s turning into
$p$’s or vice-versa. This way, we can have a notion of a function being
“independent from the $p$’s” without <em>actually</em> having globally-defined
symplectic coordinates. We just need a notion of “directions along the
momenta” at each point, and so a function whose derivative vanishes
along those directions will be independent of the momenta.</p>
<p>How do we achieve this in practice? At each point $x\in M$, we want to
choose a half-dimensional subspace $P_x$ of the tangent space $T_xM$.
This half-dimensional subspace will be the space of “momentum
directions”, but what does that mean exactly? How can we make sure that
we don’t introduce “position directions” in $P_x$? Well, in the standard
case of $\mathbb{R}^n\times \mathbb{R}^n$, each position direction
$\frac{\partial }{\partial q^\mu}$ is paired with a momentum direction
$\frac{\partial }{\partial p_\mu}$ in such a way that
$\omega(\frac{\partial }{\partial q^\mu},\frac{\partial }{\partial p_\mu})=1$.
If we fix some direction $\frac{\partial }{\partial p_\mu}$, for <em>all
other momenta</em> directions $\frac{\partial }{\partial p_\nu}$, we will
have</p>
\[\omega(\frac{\partial }{\partial p_\mu},\frac{\partial }{\partial p_\nu}) = 0.\]
<p>So if we have a collection of $n$ independent vectors $v_1$, $\dots$,
$v_n$, we know that they are all “of the same type” if the symplectic
form vanishes on all of them:</p>
\[\omega(v_\mu,v_\nu)=0.\]
<p>The subspace
spanned by $v_1,\dots,v_n$ is half-dimensional and the symplectic form
vanishes on it; i.e., it is a <em>Lagrangian</em> subspace.</p>
<p>So that’s it! What we want is a choice of a <em>Lagrangian</em> subspace
$P_x\subset T_xM$ at every point. This choice depend smoothly on $x$, so
that we get a <em>Lagrangian distribution</em> $P\subset TM$. Furthermore, we
want to be able to choose coordinates locally that are adapted to this
distribution, so it must also be <em>integrable</em>. We say that an integrable
Lagrangian distribution on $M$ is a <strong>polarization</strong>.</p>
<p>With a polarization $P$, we can restrict the space of states to those
wavefunctions (sections of the prequantum line bundle) that are
<em>covariantly constant</em> along $P$. That is, we say that the <strong>quantum
state space</strong> $\mathcal{H}(M,\omega,P)$ associated to the symplectic
manifold $(M,\omega)$ and polarization $P$ is the set of sections $\psi$
of $L$ which satisfy that for all $X\in P$,</p>
\[\nabla_X\psi = 0.\]
<p>In the case of $\mathbb{R}^n\times\mathbb{R}^n$, the “standard”
distribution is the <em>vertical distribution</em>, which is spanned by the
momentum directions $\frac{\partial }{\partial p_\mu}$. We then have
that a section $\psi$ of the prequantum line bundle $L$ is in the
quantum state space if for all $\mu$,</p>
\[\nabla_{\frac{\partial }{\partial p_\mu}}\psi = \frac{\partial \psi}{\partial p_\mu} = 0,\]
<p>which is precisely what we wanted.</p>
<h1 id="restricting-the-set-of-quantizable-observables">Restricting the set of quantizable observables</h1>
<p>We fixed the “too many variables” problem by introducing a polarization
and asking the quantum states to be sections of the prequantum line
bundle that are covariantly constant along the polarization. But now we
need the quantization of a function to preserve this property! We say
that a function $f\in C^\infty(M)$ is <strong>quantizable</strong> with respect to a
polarization $P$ if it preserves the quantum state space
$\mathcal{H}(M,\omega,P)$. That is, if $\psi$ is covariantly constant
along $P$, then $\hat{f}\psi$ must <em>also</em> be covariantly constant along
$P$. This is, for $X\in P$,</p>
\[\begin{aligned}
\nabla_X(\hat{f}\psi) &=-i\hslash\nabla_X\nabla_{X_f}\psi + \nabla_X(f\psi)\\
&=-i\hslash\left(R(X,X_f)+\nabla_{X_f}\nabla_X\psi + \nabla_{[X,X_f]}\psi\right) +f\nabla_X\psi + \mathrm{d}{f}(X)\psi\\
&=-i\hslash\left(\frac{i}{\hslash}\omega(X,X_f)\psi+\nabla_{[X,X_f]}\psi\right) + \omega(X_f,X)\psi\\
&=-i\hslash\nabla_{[X,X_f]}\psi = 0.\end{aligned}\]
<p>Here we used the fact that the curvature of the connection is
$\frac{i}{\hslash}\omega$ and $\iota_{X_f}\omega = \mathrm{d}{f}$. In
conclusion, a sufficient condition for $\hat{f}$ to preserve $\psi$
being covariantly constant is that</p>
\[[X_f,X]\in P\quad\text{for all }X\in P.\]
<p>So we say that the set of <strong>quantizable observables</strong>
$\mathrm{Obs}(M,\omega,P)\subset C^\infty(M)$ is</p>
\[\mathrm{Obs}(M,\omega,P) = \left\{f\in C^\infty(M)|[X_f,X]\in P \text{ for all }X\in P\right\}.\]
<p>It can be readily checked that for $\mathbb{R}^n\times \mathbb{R}^n$
with the vertical distribution, the quantizable observables are the
functions $f\in C^\infty(\mathbb{R}^n\times \mathbb{R}^n)$ satisfying</p>
\[\frac{\partial ^2f}{\partial p_\nu\partial p_\mu} = 0.\]
<p>Therefore a
function $f$ is quantizable if and only if it is at most linear in the
$p$ variables, i.e. it is of the form</p>
\[f(\mathbf{q},\mathbf{p}) = g(\mathbf{q}) + h^\mu(\mathbf{q})p_\mu,\]
<p>for some functions $g, h^\mu\in C^\infty(\mathbb{R}^n)$.</p>
<p>Surprise surprise. The <em>kinetic energy $p^2/2m$ is not quantizable</em>. Oh
my god. How is this okay? The <em>free particle</em> is not quantizable. With
this polarization, which is the <em>most natural</em> one, essentially none of
the physically significant Hamiltonians are quantizable.</p>
<h1 id="takeaway-or-oh-no-im-angry-about-geometric-quantization-again">Takeaway, or, Oh no I’m angry about geometric quantization again</h1>
<p>We took a few steps towards formalizing the canonical quantization rule
of physics. The first task was properly defining the problem: find a
quantization rule that takes observables $f$ on a symplectic manifold,
and returns operators $\widehat{f}$ on some Hilbert space, satisfying
the canonical commutation rule</p>
\[[\widehat{f},\widehat{g}]= i\hslash\widehat{\left\{f,g\right\}}.\]
<p>Unfortunately, this cannot
be done in a way that is consistent with <em>all</em> our desired axioms of
quantization, so we had to throw out a few.</p>
<p>After that, we looked for a quantization of general symplectic
manifolds. Our inspiration was the Lie algebra (anti-)homomorphism of
the Poisson structure on a symplectic manifold and Hamiltonian vector
fields,</p>
\[\begin{aligned}
f&\mapsto X_f\\
\left\{f,g\right\}&\mapsto X_{\left\{f,g\right\}} = -[X_f,X_g]. \end{aligned}\]
<p>Multiplying by $i\hslash$ we get a naive quantization rule, and after a
few corrections we obtained a proper quantization of functions.</p>
<p>So in summary, we got the following: A symplectic manifold $(M,\omega)$
is <strong>prequantizable</strong> if its symplectic form satisfies the integrality
condition:</p>
\[\frac{1}{2\pi\hslash}[\omega] \in H^2(M,\mathbb{Z}).\]
<p>In
this case, there exists a <strong>prequantum line bundle</strong> $L\to M$, whose
first Chern class is precisely $c_1(L) = \frac{1}{2\pi\hslash}[\omega]$,
along with a connection $\nabla$ with curvature
$\frac{i}{\hslash} \omega$, which we call the <strong>prequantum connection</strong>.</p>
<p>Now given a polarization $P\subset TM$, which is an integrable
Lagrangian distribution, we define the <strong>quantum state space</strong>
associated to $(M,\omega,P)$, as the space of sections of $L$ that are
covariantly constant along $P$:</p>
\[\mathcal{H}(M,\omega,P) = \left\{\psi\in \Gamma(L): \nabla_X\psi = 0\text{ for all }X\in P\right\}.\]
<p>The space of <strong>quantizable observables</strong> is</p>
\[\mathrm{Obs}(M,\omega,P) = \left\{f\in C^\infty(M)|[X_f,X]\in P \text{ for all }X\in P\right\},\]
<p>and the <strong>quantization</strong> of a quantizable observable $f\in \mathrm{Obs}$
is</p>
\[\hat{f} = -i\hslash\nabla_{X_f} + f.\]
<p>This quantization scheme satisfies a lot (but not <em>all</em>) of the
properties that we expect from a quantization scheme.</p>
<p>So what’s missing? The biggest problem is that the space of quantizable
observables is hilariously small. Even in the simplest case of canonical
quantization of $\mathbb{R}^n\times \mathbb{R}^n$ and the <em>obvious</em>
choice of a polarization, we can only quantize functions that are at
most <em>linear</em> in momentum. That means no kinetic energy. We can choose
another polarization (like the horizontal polarization, spanned by the
position directions), but now we can only quantize things that are
linear in position (so, for example, no harmonic oscillator).</p>
<p>If we are looking at the harmonic oscillator though, there is a way to
quantize it with the scheme above, but that requires extending our
notion of polarizations and allowing <em>complex</em> distributions. However,
even in this case, the quantization is not correct: the spectrum of the
harmonic oscillator has an incorrect ground state energy. There’s ways
to fix this, though,<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup> but <em>even then</em> there’s still a lot to be
desired because we are left with a huge and cumbersome structure that’s
needed to quantize the <em>simplest physical system</em>. And to be honest, if
your quantization scheme needs so many complicated modifications and
conditions to quantize <em>the harmonic oscillator</em> properly, then maybe
you should re-think it.</p>
<p>So why do we care about geometric quantization?</p>
<p>Well... I think we don’t? If by <em>we</em> you mean most physicists. It is
abundantly clear that it is a cumbersome tool which doesn’t have much
use in physics, and it barely even works even for the simplest systems.
However, there is <em>one</em> problem where it seems to work pretty well:
quantum Chern-Simons theory. Without going into much detail, this is a
<em>topological quantum field theory</em> of three-manifolds, and it can in
some cases be written nicely in terms of the geometric quantization of
some rather complicated moduli spaces... Which is weird. Why would
geometric quantization work for a horribly complicated topological
quantum field theory, but not for the <em>harmonic oscillator</em>?</p>
<p>For mathematicians, though, geometric quantization is a new(ish)
geometric toy to play around with that has axioms and theorems and
conjectures. If we’re still interested in making geometric quantization
work regardless of it’s applicability to physics, then there’s a few
questions that still need to be addressed, the most important of which
is the dependence on the polarization. If we choose different
polarizations, we get different quantizations, but we don’t see anything
like this in quantum mechanics. In quantum mechanics, we just...
quantize. No polarization, no fuss. So we expect that geometric
quantization is independent from the polarization?</p>
<p>In what sense exactly, though? There’s mainly two ways to see this. The
first idea was pioneered by Kostant and Sternberg,<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup> and it consists
of noticing that in quantum mechanics one can <em>also</em> work with
wavefunctions defined in the momentum variables <em>only</em>, and that the
$q$-wavefunctions and the $p$-wavefunctions are related by a Fourier
transform. In our geometric language, this means that there is a Fourier
transform map relating the quantizations with the vertical and
horizontal polarizations, call it
$F: \mathcal{H}_q\mapsto \mathcal{H}_p$. So in general, with two
<em>transverse</em> polarizations $P$, $P’$, we expect to find a generalized
Fourier transform map $F:\mathcal{H}_P\to \mathcal{H}_{P’}$.<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup></p>
<p>The other way to think about this was pioneered by Axelrod, della
Pietra, Witten, and Hitchin. Suppose that you have a collection of
polarizations that can be parametrized by a manifold $\mathcal{T}$. For
each point $\sigma\in \mathcal{T}$, we have a polarization $P_\sigma$ in
$TM$, and we can get the quantization $\mathcal{H}_\sigma$ with respect
to this polarization. If the collection of polarizations is <em>good
enough</em>, we hope that these quantizations can be put together into a
<em>vector bundle</em> $\mathcal{H}$ over $\mathcal{T}$, such that
$\mathcal{H}_\sigma$ is the fiber above $\sigma\in \mathcal{T}$.</p>
<p>If we have two polarizations represented by $\sigma_0$, $\sigma_1$, we
can take a path $\sigma(t)$ in parameter space connecting them, and if
we had a connection on this vector bundle, then parallel transport along
$\sigma(t)$ would identify the fibers $\mathcal{H}_{\sigma_0}$ and
$\mathcal{H}_{\sigma_1}$. However, this should be independent from the
choice of the path, so the connection should be <em>flat</em>. Or almost. Now
we play the “ah but quantum mechanics is in the <em>projectivization</em> of
the Hilbert space!” card so it suffices to have a connection that is
<strong>projectively flat</strong>. This is called a Hitchin connection.</p>
<p align="middle">
<img src="/assets/posts/geometric-quantization/hitchin-connection.png" width="60%" />
</p>
<p>Neither of these approaches has been proved in general, only in a few
specific cases.</p>
<p>As a mathematical theory, geometric quantization has been steadily
developed since its introduction in the late 60’s, and since then it has
lost almost all of its intentions of becoming an useful theory for
physics. My biggest gripe with it is the complete loss of focus on the
<em>observables</em>: it focuses almost entirely on the construction of Hilbert
space, and the quantization of observables is forgotten, which is
ridiculous because that was the initial motivation for all the geometric
constructions! The Hilbert space is the least important part of a
quantum theory: there’s ways to do quantum mechanics without a Hilbert
space, and interacting quantum field theory doesn’t even have a
well-defined underlying Hilbert space of states. Quantum mechanics is a
theory about <em>observables</em>, not states.</p>
<p>Oh no, I’m angry about geometric quantization again.</p>
<h1 id="the-gory-details">The gory details</h1>
<p>First, we want to show that our first correction of the naive
quantization rule breaks the commutations relations. The quantization
rule is</p>
\[f\mapsto \hat{f} = -i\hslash X_f + f.\]
<p>So take
$f, g\in C^\infty(M)$. We have</p>
\[\begin{aligned}
~[\widehat{f},\widehat{g}] &= [-i\hslash X_f + f, -i\hslash X_g + g]\\
&= -\hslash^2[X_f,X_g] -i\hslash([f,X_g] + [X_f,g])\end{aligned}\]
<p>The commutator of a vector field $X$ an a function $g$ acts on a
function $\psi$ as</p>
\[\psi = X[g\psi] - gX[\psi] = gX[\psi] + X[g]\psi - gX[\psi] = X[g]\psi,\]
<p>and therefore we write</p>
\[[X,g]= X[g].\]
<p>With this, the commutator of
$\hat{f}$ and $\hat{g}$ becomes</p>
\[\begin{aligned}
~[\widehat{f},\widehat{g}] &= \hslash^2X_{\left\{f,g\right\}} -i\hslash(X_f[g] - X_g[f])\\
&= \hslash^2X_{\left\{f,g\right\}} -i\hslash(\left\{g,f\right\} - \left\{f,g\right\})\\
&= \hslash^2X_{\left\{f,g\right\}} +2i\hslash\left\{f,g\right\}\\
&= i\hslash\left(-i\hslash X_{\left\{f,g\right\}} +\left\{f,g\right\}\right) +i\hslash\left\{f,g\right\}\\
&= i\hslash(\widehat{\left\{f,g\right\}} + \left\{f,g\right\}).\end{aligned}\]
<p>So indeed, we have a leftover term.</p>
<p>Now We want to show that once we choose a gauge (i.e. a symplectic
potential) $\theta$, the quantization rule</p>
\[f\mapsto \hat{f} = -i\hslash X_f + f - \theta(X_f)\]
<p>satisfies the
canonical commutation relations. From the result above, we can skip a
few steps, since the terms $f$ and $\theta(X_g)$ commute (because
they’re just multiplication by scalars).</p>
\[\begin{aligned}
~[\widehat{f},\widehat{g}] & = [-i\hslash X_f + f - \theta(X_f),-i\hslash X_g + g - \theta(X_g)]\\
&= \hslash^2X_{\left\{f,g\right\}} +2i\hslash\left\{f,g\right\} +i\hslash([X_f,\theta(X_g)] +[\theta(X_f),X_g])\\
&= \hslash^2X_{\left\{f,g\right\}} +2i\hslash\left\{f,g\right\} +i\hslash(X_f[\theta(X_g)] -X_g[\theta(X_f)]).\end{aligned}\]
<p>Now we use the fact that for any $1$-form $\alpha$ and vector fields
$X,Y$,</p>
\[\mathrm{d}{\alpha}(X,Y) = X[\alpha(Y)] - Y[\alpha(X)] - \alpha([X,Y]),\]
<p>so that the rightmost term is</p>
\[\begin{aligned}
X_f[\theta(X_g)] -X_g[\theta(X_f)] &= \mathrm{d}{\theta}(X_f,X_g) + \theta([X_f,X_g])\\
&= -\omega(X_f,X_g) -\theta(X_{\left\{f,g\right\}})\\
&= -\left\{f,g\right\} - \theta(X_{\left\{f,g\right\}}).\end{aligned}\]
<p>Putting everything back together, we get</p>
\[\begin{aligned}
~[\widehat{f},\widehat{g}] &= \hslash^2X_{\left\{f,g\right\}} +2i\hslash\left\{f,g\right\} -i\hslash\left\{f,g\right\} - i\hslash\theta[X_{\left\{f,g\right\}}]\\
&=i\hslash\left( -i\hslash X_{\left\{f,g\right\}} +\left\{f,g\right\} - \theta[X_{\left\{f,g\right\}}]\right)\\
&=i\hslash\widehat{\left\{f,g\right\}}.\end{aligned}\]
<h1 id="references">References</h1>
<ul>
<li>
<p>Segal, I.E., <em>Quantization of Nonlinear Systems</em>. Journal of
Mathematical Physics 1, 468 (1960).</p>
</li>
<li>
<p>Ali, S. T. and Engliš, M. <em>Quantization Methods: A Guide for
Physicists and Analysts.</em> Rev. Math. Phys. 17, 391–490 (2005).</p>
</li>
<li>
<p>Puta, M. <em>Hamiltonian Mechanical Systems and Geometric
Quantization.</em> (Springer Netherlands, 1993).</p>
</li>
<li>
<p>Woodhouse, N. M. J. <em>Geometric Quantization.</em> (Clarendon Press ;
Oxford University Press, 1992).</p>
</li>
<li>
<p>McDuff, D. and Salamon, D. <em>Introduction to Symplectic Topology.</em>
(Oxford University Press, 2017).</p>
</li>
</ul>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Recall that the Poisson bracket between two functions $f(q,p)$ and
$g(q,p)$ is given by</p>
\[\left\{f,g\right\} = \frac{\partial f}{\partial q}\frac{\partial g}{\partial p} - \frac{\partial g}{\partial q}\frac{\partial f}{\partial p}.\]
<p><a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>This tautological form is somewhat canonical in
$\mathbb{R}^{n}\times \mathbb{R}^n$. In fact, it is a canonical
one-form on any cotangent bundle $T^*Q$, which gives a canonical
symplectic structure to any cotangent bundle. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>This also holds more generally when the symplectic manifold in
question is the contangent bundle of some configuration manifold. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Why would we even care about symplectic manifolds that are not
cotangent bundles? Physically, I mean. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>For more details into this Čech-de Rham correspondence, check
Woodhouse, Section A.6. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>With something called the metaplectic correction, which I won’t
get into <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>He mentions it first in <em>Symplectic Spinors</em>, 1974. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>Funnily enough, in trying to realize this map explicitly, Kostant
and Sterberg found the need for taking square roots of the volume
form, i.e. of finding a metaplectic structure on the symplectic
manifold. So the metaplectic correction was <em>never</em> motivated in
fixing the wrong spectrum of the harmonic oscillator, it was just a
technicality needed to relate two polarizations! Todo mal. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Santiago Quintero de los RíosCome along and watch as I slowly descend into madness trying to understand what Geometric Quantization is, and why anyone would care about it.art.exe2021-05-19T00:00:00+02:002021-05-19T00:00:00+02:00http://homotopico.com/generative-art/2021/05/19/generative-art<p>A few months ago I was writing notes about things for my PhD and I had a lot of
fun making the figures for it.</p>
<p align="middle">
<img src="/assets/posts/generative_art/four-holed-sphere-landscape.png" />
</p>
<p>This one depicts slicing
a sphere with two holes in half to obtain two things
that are <em>topologically equivalent</em> to a pair-of-pants. I thought, <em>huh this this a lot of fun let’s make more pretty pictures unrelated to my PhD</em>.
I’m reasonably good at basic math,<sup>[citation needed]</sup> I’m not terrible at programming,<sup>[citation needed]</sup> and I like geometric art so I can put those together and make</p>
<p align="center"><b>
G E N E R A T I V E A R T
</b></p>
<p>which means having a computer make the pretty pictures for me so I don’t need the amazing manual skills that other artists have and I don’t because I don’t have the patience to learn new things.</p>
<p>I think this is a good a time as any to plug my <a href="http://instagram.com/eschersfish">Instagram account</a> where I post these and other pieces of generative art.</p>
<p>Here’s a few highlights of my favorites and the math behind them.</p>
<h2 id="rhumb-line">Rhumb line</h2>
<div style="text-align: center">
<blockquote class="imgur-embed-pub" lang="en" data-id="a/sMnNN7n" data-context="false"><a href="//imgur.com/a/sMnNN7n"></a></blockquote><script async="" src="//s.imgur.com/min/embed.js" charset="utf-8"></script>
</div>
<p>A rhumb line is a curve of <em>constant heading</em> on a sphere. If you’re flying on a plane and you keep your compass in the same position, you’ll trace a rhumb line. A fun thing about them is that they wrap infinitely many times around the poles, in finite time!</p>
<p>Let’s parametrize the sphere with latitude $\lambda\in [-\pi/2,\pi/2]$ and longitude $\varphi\in [0,2\pi)$, so that a point on the unit sphere has coordinates $x, y , z$ given by</p>
\[\begin{aligned}
x &= \cos\lambda\cos\varphi,\\
y &= \cos\lambda\sin\varphi,\\
z &= \sin\lambda.
\end{aligned}\]
<p>We want to find a curve $\gamma(t)$
whose tangent vector forms a constant
angle $\beta$ with the unit vector in the $\lambda$ direction. At the point
with coordinates $(\lambda,\varphi)$, tangent space is generated by
the orthonormal vectors $\hat{\lambda},\hat{\varphi}$, given by</p>
\[\begin{aligned}
\hat{\lambda} &= -\sin\lambda\cos\varphi\hat{x} -\sin\lambda\sin\varphi \hat{y} +\cos\lambda\hat{z}\\
\hat{\varphi} &= -\sin\varphi\hat{x} + \cos\varphi\hat{y}.
\end{aligned}\]
<p>So we want the tangent vector $\dot{\gamma}$ to be</p>
\[\dot{\gamma} = \cos\beta\hat{\lambda} + \sin\beta\hat{\varphi}.\]
<p>If we write $\gamma(t)$ as a function of the latitute and longitude $\lambda(t),\varphi(t)$,
i.e. very explicitly</p>
\[\gamma(t) = x(\lambda(t),\varphi(t))\hat{x}+y(\lambda(t),\varphi(t))\hat{y} + z(\lambda(t),\varphi(t)\hat{z},\]
<p>then from the chain rule and the expressions for $\hat{\lambda},\hat{\varphi}$ we obtain</p>
\[\dot{\gamma} = \dot{\lambda}\hat{\lambda} + \cos\lambda\dot{\varphi}\hat{\varphi}.\]
<p>Comparing with our desired $\dot\gamma = \cos\beta\hat{\lambda} + \sin\beta\hat\varphi$, we get a first-order ODE on $\lambda(t),\varphi(t)$:</p>
\[\begin{aligned}
\dot{\lambda} &= \cos\beta,\\
\dot\varphi &= \frac{\sin\beta}{\cos\lambda},
\end{aligned}\]
<p>which has a solution</p>
\[\begin{aligned}
\lambda(t) &= \lambda_0 + t\cos\beta\\
\varphi(t) &= \varphi_0 + \tan\beta\ln\left(\frac{\sec\lambda(t)-\tan\lambda(t)}{\sec\lambda_0 - \tan\lambda_0}\right).
\end{aligned}\]
<p>A nice side-effect of this solution is that this is a <em>constant speed</em> parametrization of the curve. In fact, the speed is always $1$, so the parameter $t$ is also the arc-length of the curve. So $\varphi$ goes berserk and shoots off to infinity as you reach $\lambda = \pm \pi/2$,
which means that the rhumb line wraps around the pole infinitely many times, but it remains finite in length!</p>
<h2 id="the-magnetic-pendulum">The magnetic “pendulum”</h2>
<p align="middle">
<img src="/assets/posts/generative_art/magnetic_pendulum_00.png" width="32%" />
<img src="/assets/posts/generative_art/magnetic_pendulum_01.png" width="32%" />
<img src="/assets/posts/generative_art/magnetic_pendulum_02.png" width="32%" />
</p>
<p align="middle">
<img src="/assets/posts/generative_art/magnetic_pendulum_03.png" width="32%" />
<img src="/assets/posts/generative_art/magnetic_pendulum_04.png" width="32%" />
</p>
<p>This is a typical example of a chaotic system. Consider a
spherical pendulum with a magnetic bob. Nearby, on a plane
below the lowest equilibrium point,
there are three (or more) magnets. After releasing the bob,
gravity and the three magnets start pushing it around, until it eventually settles due to friction. If the magnets are strong enough, the pendulum will settle above one of them.</p>
<p>If we assume that the pendulum is very long or that the oscillations are small, then we can approximate the pendulum
as a two-dimensional harmonic oscillator. In this case, if $x$ is the position of the pendulum and $x_i$ are the positions of the magnets with “magnetic charges” $q_i$, then the total force acting on the bob is</p>
\[F = -kx - b\dot x+ \sum_i \frac{q_i}{(\|x-x_i\|^2 + h^2)^{5/2}}(x-x_i),\]
<p>where $k$ is a parameter representing gravity, $h$ is the distance from the plane where the magnets are to the minimum height of the pendulum, and $b$ is a parameter that controls viscous drag.
We generate this image as follows: For every pixel, integrate the equations of motion with starting point at that pixel, from rest. Eventually, the system settles over one of the magnets. We color
the pixel according to the magnet where it ended up. As an aside, I used a GLSL shader to generate these images, because well, that’s a lot of iterations.</p>
<h2 id="the-hopf-bundle">The Hopf bundle</h2>
<p align="middle">
<img src="/assets/posts/generative_art/hopf_bundle_grid.png" width="50%" />
</p>
<p align="middle">
<img src="/assets/posts/generative_art/hopf_bundle_black.png" width="50%" />
</p>
<p>I still don’t know how to make something <em>pretty</em> with this aside from showing some of the fibers, but the mathematics is pretty nice. Colors would be nice, but I had some issues with rendering that made colors impossible.</p>
<p>Consider the 3-sphere $S^3$ as a subset of $\mathbb{C}^2$:
\(S^3 = \{(z_1,z_2)\in\mathbb{C}^2:|z_1|^2 + |z_2|^2 = 1\}.\)
From the defining equation $|z_1|^2 + |z_2|^2 = 1$, we can choose an angle $\theta\in[0,\pi]$ such that
$|z_1| = \cos(\theta/2)$ and $|z_2| = \sin(\theta/2)$. Then
we can write</p>
\[\begin{aligned}
z_1 &= \cos(\theta/2)e^{i\xi_1}\\
z_2 &= \sin(\theta/2)e^{i\xi_2}
\end{aligned}\]
<p>for some unique $\xi_1$, $\xi_2\in[0,2\pi)$. If we <em>just</em> look at the parameters $\theta,\xi_1$ and $\xi_2$, it
seems that we have a line segment and two circles. However, when $\theta = 0$ and $\theta = \pi$, the points
“collapse” in a way, since either $z_1$ or $z_2$ are fixed at zero. That makes one of the parameters $\xi$ irrelevant
at those points.</p>
<p>Compare this with the parametrization of the $2$-sphere $S^2$ with spherical coordinates $\theta$ and $\varphi$,
where $\theta$ is the zenithal coordinate and $\varphi$ is the azimuthal coordinate. When $\theta = 0$ or $\theta = \pi$, then $\varphi$ becomes irrelevant because a horizontal slice of the sphere collapses from a circle to a point.</p>
<p>This kinda suggests that we can intuitively think of $S^3$ as $S^2$ with a bunch of circles attached to every point. Let’s make it
precise: Define a map $f:S^3\to S^2$, where $f(z_1,z_2)$ is the point with zenithal coordinate $\theta$ defined as above, and
with azimuthal coordinate $\varphi = \xi_2 - \xi_1$. We can make this very explicit. From the definition above, we have
that</p>
\[\begin{aligned}
\sin(\theta) &= 2|z_1 z_2|,\\
\cos(\theta) &= |z_1|^2 + |z_2|^2,\\
\sin(\varphi) &= \left|\frac{z_1}{z_2}\right|\operatorname{Im}\left(\frac{z_2}{z_1}\right),\\
\cos(\varphi) &= \left|\frac{z_1}{z_2}\right|\operatorname{Re}\left(\frac{z_2}{z_1}\right),
\end{aligned}\]
<p>and so $f(z_1,z_2)\in S^2$ is the point with coordinates $(x,y,z)$ given by</p>
\[\begin{aligned}
x &= 2|z_1|^2\operatorname{Re}\left(\frac{z_2}{z_1}\right),\\
y &= 2|z_1|^2\operatorname{Im}\left(\frac{z_2}{z_1}\right),\\
z &= |z_1|^2 + |z_2|^2.
\end{aligned}\]
<p>The map $f:S^3\to S^2$ is surjective, and it is invariant under the $S^1$ action $(z_1,z_2)\cdot e^{i\lambda} = (e^{i\lambda}z_2,e^{i\lambda}z_2)$. In fact, if we fix a point on the fiber, then every other
point on it can be reached by this action in a unique way. We say that the
action is transitive and free on the fibers. Therefore, the fibers of $f$ are circles, and we can write $S^3$ as the union of all
these circle fibers over the sphere.</p>
<p>The next problem is trying to visualize the fibers. This is a big issue because our monkey brain only understands
figures that can be embedded in three-dimensional euclidean space. So now our task is to “flatten out” $S^3$ onto
$\mathbb{R}^3$, very much in the same way that we would want to flatten out the sphere (like the surface of Earth)
onto the plane (as if to make a map). I guess there’s many different ways to do this, like there are many different
spherical projections for cartography, but we are going to stick with the easiest one: Stereographic projection.
I won’t go into details, but the <a href="https://en.wikipedia.org/wiki/Stereographic_projection">wikipedia page</a> is a good place to
start. We have an easy formula for this projection $P:S^3\to \mathbb{R}^3$. If we write a point
on $S^3$ as $(\mathbf{x},h)$ with $\mathbf{x}\in \mathbb{R}^3$, then</p>
\[P(\mathbf{x},h) = \frac{1}{1-h}\mathbf{x}.\]
<p>So now the visualization goes as follows: Choose a point on the sphere $S^2$ with coordinates $(\theta,\varphi)$. This
point does not determine a <em>unique</em> element of $S^3$, but an entire circle. We can parametrize the entire circle
by finding a single point on the fiber, and then acting with the circle action:</p>
\[f^{-1}(\theta,\varphi) = \{(\cos(\theta/2)e^{i(\lambda+\varphi)},\sin(\theta/2)e^{i\lambda}):\lambda\in[0,2\pi) \}.\]
<p>Now we project the points to $\mathbb{R}^3$ with the stereographic projection. In the end, we obtain
a map parametrizing the fibers after projecting them:</p>
\[\mathrm{fiber}(\theta,\varphi,\lambda) = \frac{1}{1-\sin(\theta/2)\sin(\lambda)}(\cos(\theta/2)\cos(\lambda+\varphi),\cos(\theta/2)\sin(\lambda+\varphi),\sin(\theta/2)\cos(\lambda)).\]
<p>The Hopf fibration is a prime example of a principal bundle, and it shows up here and there in mathematics and physics.
Most notably, in the quantization of the Dirac monopole, the wavefunctions become sections of the associated bundle to the Hopf
bundle via some representation of $U(1)$. You can see <a href="/notes/">my notes on connections on principal bundles</a> for explanations
of some of these terms, but I plan to write something more detailed on the Hopf bundle and the Dirac monopole sometime in the future.</p>Santiago Quintero de los RíosWherein I discuss some pretty pictures I made with the computer and the mathematics behind them.What is a gauge field? Part 2: Matter Fields2020-12-16T00:00:00+01:002020-12-16T00:00:00+01:00http://homotopico.com/gauge-fields/2020/12/16/matter-fields<p>Get this post on <a href="/assets/docs/pdf_posts/matter-fields.pdf">pdf here</a>.</p>
<p><a href="/gauge-fields/2019/09/06/gauge-fields-01.html">A long time
ago</a>,
we showed that a “quicker” way to solve Maxwell’s equations (at least in
the vacuum)
is by writing the fields $\mathbf{E}$ and $\mathbf{B}$ in terms of
electromagnetic potentials $\varphi$ and $\mathbf{A}$ which satisfy</p>
\[\begin{aligned}
\mathbf{B} &= \mathbf{\nabla}\times\mathbf{A};\\
\mathbf{E} &= -\mathbf{\nabla}\varphi-\frac{1}{c}\frac{\partial \mathbf{A}}{\partial t}.\end{aligned}\]
<p>These potentials are not uniquely determined: given a smooth function
$\Lambda$, we can define new potentials as</p>
\[\begin{aligned}
\mathbf{A}' &= \mathbf{A} +\mathbf{\nabla}\Lambda,\\
\varphi' &= \varphi - \frac{1}{c}\frac{\partial \Lambda}{\partial t},\end{aligned}\]
<p>and these give rise to the <em>same</em> electric and magnetic fields
$\mathbf{E}$, $\mathbf{B}$.</p>
<p>We want to see what happens if we add a test particle, and our end goal
is seeing at how it looks in the <em>quantum case</em>. What role do the potentials play in the evolution of the particle?</p>
<p>A particle of charge $e$ moving in an electric field $\mathbf{E}$ and a
magnetic field $\mathbf{B}$ feels a force given by</p>
\[\mathbf{F}_{\text{Lor}} = e\left(\mathbf{E}+ \frac{1}{c}\mathbf{v}\times \mathbf{B}\right),\]
<p>where $\mathbf{v}$ is the velocity of the particle. This force, called
the <strong>Lorentz force</strong>, is a vector function that depends on the position
$\mathbf{x}$ and velocity $\mathbf{v}$ of the particle, and possibly on
time (if the fields $\mathbf{B}$, $\mathbf{E}$ do).</p>
<p>If we wanted to introduce the Lorentz force to a quantum-mechanical
system, we would need a Hamiltonian $H$ such that the Hamilton equations
of motion</p>
\[\begin{aligned}
\dot{q}^i &= \frac{\partial H}{\partial p_i}\\
\dot{p}_i &= -\frac{\partial H}{\partial q^i}\end{aligned}\]
<p>are equivalent to the usual Newtonian equations of motion</p>
\[m\ddot{\mathbf{x}} = \mathbf{F}_{\text{Lor}}.\]
<p>With this Hamiltonian,
we would apply our favorite quantization rules.</p>
<p>But how do we find such a Hamiltonian? The best way to do so is to write
the Lorentz force in terms of a Lagrangian, and then do the Legendre
transform to obtain a Hamiltonian.</p>
<h1 id="the-classical-case">The classical case</h1>
<p>Here’s where the <em>gauge fun</em> begins. Choose a pair of potentials
$\varphi,\mathbf{A}$ for $\mathbf{E},\mathbf{B}$. These satisfy</p>
\[\begin{aligned}
\mathbf{B} &= \mathbf{\nabla}\times\mathbf{A};\\
\mathbf{E} &= -\mathbf{\nabla}\varphi-\frac{1}{c}\frac{\partial \mathbf{A}}{\partial t}.\end{aligned}\]
<p>It can be shown (and we do so below in the last section) that a
Lagrangian for the Lorentz force is given by</p>
\[L(\mathbf{x},\mathbf{v},t) = \frac{1}{2}m\|\mathbf{v}\|^2 -e\varphi(\mathbf{x},t)+\frac{e}{c}\mathbf{v}\cdot \mathbf{A}(\mathbf{x},t).\]
<p>Of course, one way to “prove” that this is a Lagrangian for the Lorentz
force is simply showing that the Euler-Lagrange equations are precisely
the equations of the Lorentz force. But that’s really <em>ad hoc</em>, and
in <a href="#finding-the-lagrangian">the gory details</a> below we show a more “natural” derivation.</p>
<p>From the Lagrangian, we see that the canonical momenta conjugate to the
positions $\mathbf{x}$ are</p>
\[p_i =\frac{\partial L}{\partial \dot{x}^i}= m\dot{x}^i+\frac{e}{c}A^i,\]
<p>so</p>
\[\mathbf{p} = m\dot{\mathbf{x}} + \frac{e}{c}\mathbf{A}.\]
<p>Note that
$\mathbf{p}$ depends <em>explicitly</em> on the vector potential $\mathbf{A}$,
which is a sign that it is <em>not</em> a physical quantity, since we can
change the potential $\mathbf{A}$ to another physically equivalent one.
This means that we shouldn’t be able to measure $\mathbf{p}$, since
$\mathbf{A}$ is not uniquely determined. We will return to this issue of
<em>physical quantities</em> later.</p>
<p>With the Legendre transform, we can find the Hamiltonian (this is just a
computation, no tricks involved):</p>
\[H(\mathbf{x},\mathbf{p},t) = \dot{\mathbf{x}}\cdot \mathbf{p} - L = \frac{1}{2m}\left\|\mathbf{p}-\frac{e}{c}\mathbf{A}(\mathbf{x},t)\right\|^2 + e\varphi(\mathbf{x},t).\]
<div style="text-align: center"><img src="/assets/posts/matter-fields/objection.png" width="256" /></div>
<p>The Hamiltonian (and the Lagrangian too) has an explicit dependence on
the potentials $\varphi, \mathbf{A}$, whereas the Lorentz force is only
dependent on the fields $\mathbf{E}$ and $\mathbf{B}$. If we change the
potentials via a gauge transformation, the Lorentz force doesn’t change,
but the Lagrangian does! So there’s something funky going on here. How
do we reconcile this?</p>
<p>Well, the Lagrangian changes, but the equations of motion don’t. Let’s
see this explicitly: let $\Lambda$ be a smooth (time-dependent) function
and let’s do the gauge transformation</p>
\[\begin{aligned}
\mathbf{A}' &= \mathbf{A}+\mathbf{\nabla}\Lambda\\
\varphi' &=\varphi-\frac{1}{c}\frac{\partial \Lambda}{\partial t}.\end{aligned}\]
<p>Substituting in the Lagrangian and doing a little reordering, we obtain</p>
\[L'(\mathbf{x},\mathbf{v},t) = \frac{1}{2}m\|\mathbf{v}\|^2 -e\varphi(\mathbf{x},t)+\frac{e}{c}\mathbf{v}\cdot \mathbf{A}(\mathbf{x},t) +\frac{e}{c}\left(\mathbf{v}\cdot\mathbf{\nabla}\Lambda +\frac{\partial \Lambda}{\partial t}\right) := L(\mathbf{x},\mathbf{v},t) + \frac{e}{c}\frac{\mathrm{d}\Lambda}{\mathrm{d}t}.\]
<p>Here, we have defined $L’$ as $L$ but with $\varphi’$ and $\mathbf{A}’$
instead of $\varphi,\mathbf{A}$, and we have defined the total
derivative of $\Lambda$ as</p>
\[\left(\frac{\mathrm{d}\Lambda}{\mathrm{d}t}\right)(\mathbf{x},\mathbf{v},t):= \mathbf{v}\cdot\mathbf{\nabla}\Lambda +\frac{\partial \Lambda}{\partial t}.\]
<p>This total derivative coincides with the derivative obtained from the
chain rule, if we evaluate it on $\mathbf{x}(t),\dot{\mathbf{x}}(t),t$
for a curve $\mathbf{x}:\mathbb{R}\to \mathbb{R}^3$. That is,</p>
\[\left(\frac{\mathrm{d}\Lambda}{\mathrm{d}t}\right)(\mathbf{x}(t),\dot{\mathbf{x}}(t),t)= \frac{\mathrm{d}}{\mathrm{d}t}(\Lambda(\mathbf{x}(t),t)).\]
<p>Therefore, our transformed Lagrangian has the form</p>
\[L' = L + \frac{\mathrm{d}F}{\mathrm{d}t}\]
<p>with $F=(e/c)\Lambda$. This
tells us that the Lagrangian itself is not gauge-invariant; however,
since it transforms up to a <em>total derivative</em>, the equations of motion
are invariant. We show this below in the <a href="#gauge-invariance-of-the-equations-of-motion">gory details</a>.</p>
<p>What about the Hamiltonian picture? First, let’s see what happens to the
canonical momenta. Under a gauge transformation, they transform as</p>
\[p'_i = \frac{\partial L'}{\partial \dot{x}^i} = \frac{\partial L}{\partial \dot{x}^i} +\frac{\partial }{\partial \dot{x}^i}\left(\frac{\mathrm{d}F}{\mathrm{d}t}\right) = p_i +\frac{\partial F}{\partial x^i}.\]
<p>Therefore, the canonical momentum changes under a change of gauge as
$\mathbf{p}’=\mathbf{p}+\mathbf{\nabla}F$. If we perform the Legendre
transform of $L’$, we obtain a Hamiltonian in terms of this new
canonical momentum $\mathbf{p}’$</p>
\[H'(\mathbf{x},\mathbf{p}',t) = \dot{\mathbf{x}}(\mathbf{p}')\cdot\mathbf{p}' - L'(\mathbf{x},\dot{\mathbf{x}}(\mathbf{p}'),t);\]
<p>where we have made it explicit that we must write $\dot{\mathbf{x}}$ in
terms of $\mathbf{p}’$ and not $\mathbf{p}$. Carrying out the
computation we obtain</p>
\[H'(\mathbf{x},\mathbf{p}',t) =\frac{1}{2m}\left\|\mathbf{p}'-\frac{e}{c}\mathbf{A}'(\mathbf{x},t)\right\|^2 + e\varphi'(\mathbf{x},t).\]
<p>Does this mean that the Hamiltonian is gauge invariant? <em>No, it does
not</em>. This Hamiltonian is written in terms of the new momentum
$\mathbf{p}’$, and we need to see how it relates to the Hamiltonian with
the old momentum $\mathbf{p}$. So we substitute all the new momenta and
potentials in terms of the old:</p>
\[\begin{aligned}
H'(\mathbf{x},\mathbf{p}',t) &= \frac{1}{2m}\left\|\mathbf{p} + \frac{e}{c}\mathbf{\nabla}\Lambda -\frac{e}{c}\mathbf{A} -\frac{e}{c}\mathbf{\nabla}\Lambda \right\|^2 + e\varphi(\mathbf{x},t) -\frac{e}{c}\frac{\partial \Lambda}{\partial t}\\
&= \frac{1}{2m}\left\|\mathbf{p}-\frac{e}{c}\mathbf{A}\right\|^2 + e\varphi(\mathbf{x},t) -\frac{e}{c}\frac{\partial \Lambda}{\partial t}\\
&= H(\mathbf{x},\mathbf{p},t) -\frac{e}{c}\frac{\partial \Lambda}{\partial t}.\end{aligned}\]
<p>Thus, the Hamiltonian is in general <em>not</em> gauge-invariant! But once
again, the day is saved since Hamilton’s <em>equations of motion</em> are.
Again, we leave this to the <a href="gauge-invariance-of-the-equations-of-motion">gory details</a>.</p>
<p>In conclusion, even though the Lagrangian and Hamiltonian are explicitly
dependent on the potentials, and therefore <em>not</em> gauge-invariant, the
equations of motions <em>are</em> gauge-invariant, so the dynamics of the
system are well-defined. This is to be expected, since both the
Lagrangian and the Hamiltonian picture are equivalent to Newton’s
equations of motion, which do not even include the potentials
explicitly.</p>
<h1 id="the-quantum-case">The quantum case</h1>
<p>Now that we have a Hamiltonian, we can write the Schrödinger equation</p>
\[i\hslash \frac{\partial \psi}{\partial t} = H\psi = \frac{1}{2m}\left(\mathbf{p}-\frac{e}{c}\mathbf{A}\right)^2\psi + e\varphi\psi.\]
<p>Here comes another problem: The Hamiltonian is dependent on the choice
of potential! But this time we can’t shield ourselves under the “don’t
worry, the equations of motion are safe” that we used in the last
section, since the Schrödinger equation <em>is</em> the equation of motion! So
there’s nothing stopping the evolution of the wavefunction $\psi$ from
depending on the choice of potential!</p>
<p>But we <em>do</em> have one more trick under our sleeve. The wavefunction is
not the measurable object, but rather its square norm $|\psi|^2$, and in
general the <em>expectation values</em><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> of Hermitian operators $\hat{O}$</p>
\[\left\langle \psi\middle|\hat{O} \middle\vert \psi\right\rangle.\]
<p>This means that we can save this Hamiltonian if we guarantee that
whenever we change the potentials to some new ones
$\mathbf{A}’,\varphi’$ (via some gauge transformation), then every
solution $\psi$ of the Schrödinger equation and every observable
$\hat{O}$ have <em>physically equivalent</em> solutions $\psi’$ (to the
Schrödinger equation with the new potentials) and observables $\hat{O}’$
such that</p>
\[\left\langle \psi'\middle|\hat{O}' \middle\vert \psi'\right\rangle=\left\langle \psi\middle|\hat{O} \middle\vert \psi\right\rangle.\]
<p>One way to guarantee this is with unitary transformations. Suppose that
$\psi$ is a solution to the Schrödinger equation with potentials
$\varphi,\mathbf{A}$. Now let $\Lambda$ be a smooth function and let
$\varphi’,\mathbf{A}’$ be the gauge-transformed potentials</p>
\[\begin{aligned}
\mathbf{A}'&=\mathbf{A}+\nabla\Lambda\\
\varphi' &= \varphi -\frac{1}{c}\frac{\partial \Lambda}{\partial t}.\end{aligned}\]
<p>Suppose that there exists a unitary transformation $U(\Lambda)$
associated to $\Lambda$ such that the new “gauge-transformed”
wavefuntion</p>
<p>\(\begin{aligned}
\psi' &= U\psi\end{aligned}\)
is a solution of the Schrödinger
equation with the potentials $\varphi’,\mathbf{A}’$. If for every
observable $\hat{O}$ we define</p>
\[\hat{O}' = U\hat{O}U^{-1},\]
<p>then necessarily</p>
\[\left\langle \psi'\middle|\hat{O}' \middle\vert \psi'\right\rangle= \left\langle U\psi\middle|U\hat{O}U^{-1} \middle\vert U\psi\right\rangle=\left\langle \psi\middle|U^{\dagger}U\hat{O} \middle\vert \psi\right\rangle= \left\langle \psi\middle|\hat{O} \middle\vert \psi\right\rangle.\]
<p>This follows from the fact that $U^\dagger U = I$, since $U$ is unitary.</p>
<p>Now we have a problem. The rule $\hat{O}\mapsto U\hat{O}U^{-1}$ gives us
a way to transform observables between different gauges. However, we may
already have a definition of the observable in a different gauge! For
example, if we write the momentum operator $\mathbf{p}$ in the position
representation, it becomes $-i\hslash\nabla$. This definition <em>should</em>
be the same for <em>all</em> gauges, since changing gauges does not alter the
coordinates. That means that we define</p>
\[\mathbf{p}_{\mathbf{A},\varphi} = \mathbf{p}_{\mathbf{A}',\varphi'}\overset{\text{position rep.}}{:=}-i\hslash\nabla\]
<p>for all potentials<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. However, we also have a gauge transformation
rule that tells us how operators transform between gauges. Do these
prescriptions agree with one another? That is, do we have</p>
\[U\mathbf{p}_{A,\varphi}U^{-1} \overset{?}{=}\mathbf{p}_{\mathbf{A}',\varphi'}\]
<p>As we will see below, the answer is <strong>no</strong>, since, assuming that $U$
depends only on positions and not momenta,</p>
\[U\mathbf{p}_{A,\varphi}U^{-1} = \mathbf{p}_{A,\varphi} + i\hslash (\nabla U)U^{-1}\neq \mathbf{p}_{A',\varphi'}.\]
<p>There is a conflict between the transformation law and our <em>definition</em>
of the momentum operator between different gauges. This tells us that
the observable $\mathbf{p}$ is <em>not physical</em>, because the results of
observations cannot be defined consistently between gauges (and
remember, up to this point, the gauges are just mathematical tools).</p>
<p>In general, we say that an observable $\hat{O}$ is <strong>physical</strong> if its
definition in different gauges is consistent with the transformation
law. That is, if</p>
\[U\hat{O}_{\mathbf{A},\varphi}U^{-1} = \hat{O}_{\mathbf{A}',\varphi'}.\]
<p>Another example of an <em>un</em>physical observable is the potential
$\mathbf{A}$. If we assume that the unitary transformation $U$ depends
only on the position, then it commutes with $\mathbf{A}$, so</p>
\[U\mathbf{A}U^{-1} = \mathbf{A} \overset{!}{\neq} \mathbf{A}'.\]
<p>So how do we find $U(\Lambda)$? Does it even exist? In the <a href="#finding-the-unitary-transformation">gory details</a>
below, we show that the correct unitary transformation is</p>
\[U(\Lambda) = \exp\left(\frac{ie}{\hslash c}\Lambda\right),\]
<p>so that the wavefunction $\psi$ transforms as</p>
\[\psi' = \exp\left(\frac{ie}{\hslash c}\Lambda\right)\psi.\]
<p>Let’s check that $\psi’$ does indeed satisfy the Schrödinger equation with the
Hamiltonian with respect to the new gauge. We have</p>
\[\begin{aligned}
i\hslash\frac{\partial \psi'}{\partial t} &= i\hslash\frac{\partial U(\Lambda)}{\partial t}\psi + i\hslash U(\Lambda)\frac{\partial \psi}{\partial t}\\
&= -\frac{e}{c}\frac{\partial \Lambda}{\partial t}\exp\left(\frac{ie}{\hslash c}\Lambda\right)\psi + U(\Lambda)\left(i\hslash\frac{\partial \psi}{\partial t}\right).\end{aligned}\]
<p>By hypothesis $\psi$ satisfies the Schrödinger equation with the
potentials $\mathbf{A},\varphi$, so</p>
\[\begin{aligned}
i\hslash\frac{\partial \psi'}{\partial t} &= -\frac{e}{c}\frac{\partial \Lambda}{\partial t}\psi' + U(\Lambda)\left( \frac{1}{2m}\left(\mathbf{p}-\frac{e}{c}\mathbf{A}\right)^2 + e\varphi\right)\psi.\end{aligned}\]
<p>Now we note that</p>
\[U(\Lambda)\mathbf{p}= \mathbf{p}U(\Lambda) + [U(\Lambda),\mathbf{p}]= \mathbf{p}U(\Lambda) + i\hslash\nabla U(\Lambda)= \mathbf{p}U(\Lambda) - \frac{e}{c}\nabla\Lambda U(\Lambda) = \left(\mathbf{p}-\frac{e}{c}\nabla\Lambda\right)U(\Lambda).\]
<p>Therefore,</p>
\[U(\Lambda)\left(\mathbf{p}-\frac{e}{c}\mathbf{A}\right)^2 = \left(\mathbf{p}-\frac{e}{c}\mathbf{A}-\frac{e}{c}\nabla\Lambda\right)^2U(\Lambda)\]
<p>Since $U(\Lambda)$ depends only on position, then it commutes with
$\varphi$. Therefore, we obtain (after a little rearrangement)</p>
\[\begin{aligned}
i\hslash\frac{\partial \psi'}{\partial t} &= \frac{1}{2m}\left(\mathbf{p}-\frac{e}{c}\left(\mathbf{A}+\nabla\Lambda\right)\right)^2\psi' + e\left(\varphi-\frac{1}{c}\frac{\partial \Lambda}{\partial t}\right)\psi'\\
&= \frac{1}{2m}\left(\mathbf{p}-\frac{e}{c}\mathbf{A}'\right)^2\psi' + e\varphi'\psi'.\end{aligned}\]
<p>Finally, we note that the definition of $\mathbf{p}$ is the same for all
gauges, so we write $\mathbf{p}’ = \mathbf{p}$, and thus obtain</p>
\[i\hslash\frac{\partial \psi'}{\partial t} = H_{\mathbf{A}',\varphi'}\psi'.\]
<p>Therefore, the wavefunction
$\psi’ = \exp\left(\frac{ie}{\hslash c}\Lambda\right)\psi$ satisfies the
Schrödinger equation with the gauge-transformed potentials $\varphi’$,
$\mathbf{A}’$. In the context of gauge theories, we call $\psi$ a
<strong>matter field</strong>.</p>
<h1 id="minimal-coupling-and-covariant-derivatives">Minimal coupling and covariant derivatives</h1>
<p>The Schrödinger equation for the particle coupled to an electromagnetic
field is not very different from the free equation. If we start with the
free equation</p>
\[i\hslash\frac{\partial \psi}{\partial t} = \frac{1}{2m}\mathbf{p}^2\psi,\]
<p>and make the changes</p>
\[\begin{aligned}
\frac{\partial }{\partial t} &\mapsto \frac{\partial }{\partial t} +\frac{ie}{\hslash}\varphi,\\
\mathbf{p} &\mapsto \mathbf{p} -\frac{e}{c}\mathbf{A},\end{aligned}\]
<p>then we obtain the coupled equation</p>
\[i\hslash \frac{\partial \psi}{\partial t} = \frac{1}{2m}\left(\mathbf{p}-\frac{e}{c}\mathbf{A}\right)^2\psi + e\varphi\psi.\]
<p>This is called the <em>minimal coupling</em> prescription.</p>
<p>These seem like arbitrary changes. However, if we write everything in
the unified four-dimensional framework we talked about last time, we’ll
see that they are similar. Our four-dimensional coordinates are
$x^0 = ct$, $x^1=x$, $x^2=y$, $x^3=z$. We condense the fields and
potentials into four-dimensional differential forms: The potentials
become a one-form $A = A_\mu\mathrm{d}{x}^\mu$ with components</p>
\[\begin{aligned}
A_0 &= \varphi & A_i &= -\mathbf{A}^i,\end{aligned}\]
<p>and the fields become a two-form
$F = \frac{1}{2}F_{\mu\nu}\mathrm{d}{x}^\mu\wedge\mathrm{d}{x}^\nu$ with
components</p>
\[\begin{aligned}
F_{0i} &= \mathbf{E}^i & F_{ij} &= -\varepsilon_{ijk}\mathbf{B}^k,\end{aligned}\]
<p>satisfying</p>
\[F = \mathrm{d}{A}.\]
<p>Under a gauge transformation, the
electromagnetic potential $A$ transforms as</p>
\[A\mapsto A' = A -\mathrm{d}{\Lambda},\]
<p>and of course the field strength $F$ remains invariant.</p>
<p>The minimal coupling prescription is now obtained by making the change</p>
\[\partial_\mu \mapsto \mathscr{D}_\mu:= \partial_\mu +\frac{ie}{\hslash c}A_\mu.\]
<p>The symbol $\mathscr{D}_{\mu}$ is called the <em>covariant derivative</em>.
Indeed, we can check that the compontents $\mathscr{D}_0$ and
$\mathscr{D}_i$ correspond to the operators
$\partial_t + \frac{ie}{\hslash}\varphi$ and
$\mathbf{p}-\frac{e}{c}\mathbf{A}$ that we discussed above.</p>
<p>Why do we care about this covariant derivative? If transform to a new
gauge $A’=A-\mathrm{d}{\Lambda}$, then the covariant derivative changes
as</p>
\[\mathscr{D}_\mu\mapsto \mathscr{D}_\mu' = \mathscr{D}_\mu -\frac{ie}{\hslash c}\partial_\mu\Lambda.\]
<p>So it’s not quite gauge-invariant on its own. However, when we let
$\mathscr{D}_\mu$ act on the wavefunction $\psi$, and apply a gauge
transformation to <em>both</em> at the same time, we get</p>
\[\begin{aligned}
\mathscr{D}_\mu'\psi' &= \left(\mathscr{D}_\mu - \frac{ie}{\hslash c}\partial_\mu\Lambda\right)\exp\left(\frac{ie}{\hslash c}\Lambda\right)\psi\\
&=\partial_\mu\left(\exp\left(\frac{ie}{\hslash c}\Lambda\right)\psi\right) - \exp\left(\frac{ie}{\hslash c}\Lambda\right)\left(\frac{ie}{\hslash c}A_\mu\psi + \frac{ie}{\hslash c}\partial_\mu\Lambda \psi\right)\\
&=\exp\left(\frac{ie}{\hslash c}\Lambda\right)\left(\frac{ie}{\hslash c}\partial_\mu\Lambda \psi + \partial_\mu\psi - \frac{ie}{\hslash c}A_\mu\psi - \frac{ie}{\hslash c}\partial_\mu\Lambda \psi\right)\\
&=\exp\left(\frac{ie}{\hslash c}\Lambda\right)\mathscr{D}_\mu\psi\\
&=(\mathscr{D}_\mu\psi)'.\end{aligned}\]
<p>Thus, after applying the covariant derivative to the wavefunction $\psi$, we
get another wavefunction which transforms properly under gauge
transformations. We call this gauge <em>covariance</em>: If apply the
gauge-transformed covariant derivative to the gauge-transformed matter
field, we get the same result as applying the un-transformed derivative
to the un-transformed field and <em>then</em> transforming the result.</p>
<h1 id="the-takeaway">The takeaway</h1>
<p>We started with some <em>fields</em> $\mathbf{E}$ and $\mathbf{B}$ which could
be written in terms of some <em>potentials</em> $\mathbf{A}$, $\varphi$. The
potentials are <em>not</em> uniquely determined, since we can change them by a
<em>gauge transformation</em>, and the fields remain the same. The quantities
that are invariant under these gauge transformations are <em>physical</em>.</p>
<p>In quantum mechanics, the wavefunction $\psi$ and <em>physical</em> observables
$\hat{O}$ might be gauge-dependent, but under a gauge transformation,
they transform by a unitary transformation in a way that the expectation
values of physical observables are all invariant.</p>
<p>In summary, we have the following objects and how they transform under a
gauge transformation:</p>
\[\begin{array}{ccl}
\mathbf{A}& \mapsto & \mathbf{A}' = \mathbf{A}+\nabla\Lambda\\[0.1em]
\varphi & \mapsto & \varphi' = \displaystyle \varphi -\frac{1}{c}\frac{\partial \Lambda}{\partial t}\\
\mathbf{E}& \mapsto & \mathbf{E} \\
\mathbf{B}& \mapsto & \mathbf{B} \\
\psi & \mapsto & \psi' = \displaystyle \exp\left(\frac{ie}{\hslash c}\Lambda\right)\psi\\[0.5em]
\hat{O} & \mapsto & \hat{O}' = \displaystyle \exp\left(\frac{ie}{\hslash c}\Lambda\right)\hat{O}\exp\left(-\frac{ie}{\hslash c}\Lambda\right).
\end{array}\]
<p>Finally, we saw that an “easy” way to go from the free
theory to the minimally coupled theory is substituting ordinary
derivatives with covariant derivatives:</p>
\[\partial_\mu\mapsto\mathscr{D}_\mu = \partial_\mu + \frac{ie}{\hslash c}A_\mu.\]
<p>This is how it is often done for more complicated gauge theories (which
we will explore later).</p>
<p>The next step is interpreting all these objects as <em>local</em>
representations of global objects in the theory of <em>principal bundles</em>.</p>
<h1 id="the-gory-details">The gory details</h1>
<h2 id="finding-the-lagrangian">Finding the Lagrangian</h2>
<p>Substituting the expressions for $\mathbf{E}$ and $\mathbf{B}$ in terms
of the potentials $\varphi$ and $\mathbf{A}$ in the Lorentz force, we
obtain</p>
\[\mathbf{F} = e\left(-\mathbf{\nabla}\varphi-\frac{1}{c}\frac{\partial \mathbf{A}}{\partial t} + \frac{1}{c}\mathbf{v}\times (\mathbf{\nabla}\times\mathbf{A})\right).\]
<p>Now we use one of those super fun vector product identities,</p>
\[\mathbf{a}\times(\mathbf{b}\times\mathbf{c}) = (\mathbf{a}\cdot\mathbf{c})\mathbf{b}-(\mathbf{a}\cdot\mathbf{b})\mathbf{c},\]
<p>which becomes in our case</p>
\[\mathbf{v}\times (\mathbf{\nabla}\times\mathbf{A}) = \mathbf{\nabla}(\mathbf{v}\cdot \mathbf{A}) - (\mathbf{v}\cdot\mathbf{\nabla})\mathbf{A}.\]
<p>Therefore,</p>
\[\mathbf{F} = e\left(-\mathbf{\nabla}\left(\varphi-\frac{1}{c}\mathbf{v}\cdot\mathbf{A}\right) - \frac{1}{c}\left(\frac{\partial \mathbf{A}}{\partial t} +(\mathbf{v}\cdot\mathbf{\nabla})\mathbf{A}\right)\right).\]
<p>Let’s plug this into Newton’s equation of motion. Let
$\mathbf{x}:\mathbb{R}\to \mathbb{R}^3$ be the trajectory of a particle
of mass $m$, and let $\dot{\mathbf{x}}$ be its velocity. Newton’s second
law reads</p>
\[m\ddot{\mathbf{x}}(t) = \mathbf{F}(\mathbf{x}(t),\dot{\mathbf{x}}(t),t) = e\left(-\mathbf{\nabla}\left(\varphi-\frac{1}{c}\dot{\mathbf{x}}\cdot\mathbf{A}\right) - \frac{1}{c}\left(\frac{\partial \mathbf{A}}{\partial t} +(\dot{\mathbf{x}}\cdot\mathbf{\nabla})\mathbf{A}\right)\right).\]
<p>It is important to note that here we are implicitly evaluating the
time-dependent fields $\varphi,\mathbf{A}$ at $(\mathbf{x}(t),t)$. In
particular, the rightmost term becomes, applying the chain rule,</p>
\[\frac{\partial \mathbf{A}}{\partial t}(\mathbf{x}(t),t) +((\dot{\mathbf{x}}\cdot\mathbf{\nabla})\mathbf{A})(\mathbf{x}(t),t) = \frac{\mathrm{d}}{\mathrm{d}t}\mathbf{A}(\mathbf{x}(t),t).\]
<p>Then Newton’s second law becomes, in components,</p>
\[m\ddot{\mathbf{x}}^i(t) = e\left(-\frac{\partial }{\partial x^i}\left(\varphi-\frac{1}{c}\sum_{k}\dot{\mathbf{x}}^k\mathbf{A}^k\right)-\frac{1}{c}\frac{\mathrm{d}}{\mathrm{d}t}\mathbf{A}^i(\mathbf{x}(t),t)\right).\]
<p>Now comes the <em>dirty trick</em>. We can write $\mathbf{A}^i$ as</p>
\[-\frac{1}{c}\mathbf{A}^i = -\frac{1}{c}\frac{\partial }{\partial \dot{x}^i}\left(\sum_{k}\dot{x}^k\mathbf{A}^k \right) = \frac{\partial }{\partial \dot{x}^i}\left(\varphi - \frac{1}{c}\sum_{k}\dot{x}^k\mathbf{A}^k \right).\]
<p>Similarly, we can write $\ddot{\mathbf{x}}^i$ as</p>
\[\ddot{\mathbf{x}}^i = \frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\partial }{\partial \dot{x}^i}\frac{1}{2}\sum_{k}{\dot{x}^k}\dot{x}^k\right).\]
<p>With these replacements, Newton’s equation takes the form of an
Euler-Lagrange equation:</p>
\[\frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\partial }{\partial \dot{x}^i}\frac{m}{2}\sum_{k}{\dot{x}^k}\dot{x}^k\right)= \frac{\partial }{\partial x^i}\left(-e\varphi+\frac{e}{c}\sum_{k}\dot{x}^k\mathbf{A}^k\right) +\frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial }{\partial \dot{x}^i}\left(e\varphi - \frac{e}{c}\sum_{k}\dot{x}^k\mathbf{A}^k \right).\]
<p>Or, well, after a few rearrangements:</p>
\[\frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial }{\partial \dot{x}^i}\left(\frac{m}{2}\sum_{k}{\dot{x}^k}\dot{x}^k -e\varphi + \frac{e}{c}\sum_{k}\dot{x}^k\mathbf{A}^k \right) - \frac{\partial }{\partial x^i}\left(\frac{m}{2}\sum_{k}{\dot{x}^k}\dot{x}^k -e\varphi + \frac{e}{c}\sum_{k}\dot{x}^k\mathbf{A}^k \right)=0.\]
<p>Therefore, we can use the Lagrangian</p>
\[L(\mathbf{x},\mathbf{v},t) = \frac{1}{2}m\|\mathbf{v}\|^2 -e\varphi(\mathbf{x},t)+\frac{e}{c}\mathbf{v}\cdot \mathbf{A}(\mathbf{x},t).\]
<h2 id="gauge-invariance-of-the-equations-of-motion">Gauge-invariance of the equations of motion</h2>
<p>Under a gauge transformation, the Lagrangian changes as</p>
<p>\(L' = L + \frac{\mathrm{d}F}{\mathrm{d}t}\)
with $F=(e/c)\Lambda$.
Although the Lagrangian itself is not gauge-invariant, since it
transforms up to a <em>total derivative</em>, then the equations of motion are
invariant:</p>
\[\begin{aligned}
\frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\partial L'}{\partial \dot{x}^i}\right) - \frac{\partial L'}{\partial x^i} &= \frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\partial L}{\partial \dot{x}^i}\right) - \frac{\partial L}{\partial x^i} +\frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\partial }{\partial \dot{x}^i}\frac{\mathrm{d}F}{\mathrm{d}t}\right) - \frac{\partial }{\partial x^i}\left(\frac{\mathrm{d}F}{\mathrm{d}t}\right)\\
&= \frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\partial L}{\partial \dot{x}^i}\right) - \frac{\partial L}{\partial x^i} +\frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\partial F}{\partial x^i}\right) - \frac{\partial }{\partial x^i}\left(\frac{\mathrm{d}F}{\mathrm{d}t}\right)\\
&= \frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\partial L}{\partial \dot{x}^i}\right) - \frac{\partial L}{\partial x^i}.\end{aligned}\]
<p>Here we used the fact that</p>
\[\frac{\partial }{\partial \dot{x}^i}\left(\frac{\mathrm{d}F}{\mathrm{d}t}\right) = \frac{\partial }{\partial \dot{x}^i}\left( \frac{\partial F}{\partial t} + \sum_k\dot{x}^k\frac{\partial F}{\partial x^k}\right) = \frac{\partial F}{\partial x^i}.\]
<p>In the Hamiltonian picture, the canonical momenta transform as</p>
<p>\(\mathbf{p}'=\mathbf{p}+\mathbf{\nabla}F,\)
and the Hamiltonian
transforms like</p>
\[H'(\mathbf{x},\mathbf{p}',t) = H(\mathbf{x},\mathbf{p},t) -\frac{\partial F}{\partial t}.\]
<p>Although the Hamiltonian is <em>not</em> gauge-invariant, the <em>equations of
motion</em> are. If we have a trajectory $\mathbf{x}(t)$, $\mathbf{p}(t)$
which satisfies</p>
\[\begin{aligned}
\dot{\mathbf{x}}^i &= \frac{\partial H}{\partial p_i}\\
\dot{\mathbf{p}}_i &= -\frac{\partial H}{\partial x^i},\end{aligned}\]
<p>then it also satisfies</p>
\[\begin{aligned}
\dot{\mathbf{x}}^i = \frac{\partial H}{\partial p_i} &= -\frac{\partial }{\partial p_i}\left(H'(\mathbf{x},\mathbf{p}',t)+\frac{\partial F}{\partial t}\right)\\
&=\sum_{j}\frac{\partial H'}{\partial p'_j}\frac{\partial p'_j}{\partial p_i}\\
&=\frac{\partial H'}{\partial p'_i}.\end{aligned}\]
<p>The equations of motion for the momenta are more subtle. We have to note
that when we write $\mathbf{p’}=\mathbf{p}+\nabla F$, we are introducing
an <em>explicit</em> dependence of $\mathbf{p}’$ on the position variables
$x^i$. And so, we must be careful when applying the chain rule:</p>
\[\begin{aligned}
\dot{\mathbf{p}}'_i &= \dot{\mathbf{p}}_i + \frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial F}{\partial x^i}\\
&= -\frac{\partial H}{\partial x^i} + \frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial F}{\partial x^i}\\
&= -\frac{\partial }{\partial x^i}\left(H'(\mathbf{x},\mathbf{p}',t) +\frac{\partial F}{\partial t}\right) + \frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial F}{\partial x^i}\\
&=-\frac{\partial H'}{\partial x^i} -\sum_{j}\frac{\partial H'}{\partial p'_j}\frac{\partial p'_j}{\partial x^i} - \frac{\partial }{\partial x^i}\frac{\partial F}{\partial t} + \frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial F}{\partial x^i}\\
&=-\frac{\partial H'}{\partial x^i} -\sum_{j}\dot{\mathbf{x}}_j\frac{\partial }{\partial x^i}\frac{\partial F}{\partial x^j} - \frac{\partial }{\partial x^i}\frac{\partial F}{\partial t} + \frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial F}{\partial x^i}\\
&=-\frac{\partial H'}{\partial x^i}-\frac{\partial }{\partial x^i}\left(\sum_{j}\dot{\mathbf{x}}^j\frac{\partial F}{\partial x^j} +\frac{\partial F}{\partial t}\right) +\frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial F}{\partial x^i}\\
&=-\frac{\partial H'}{\partial x^i} -\frac{\partial }{\partial x^i}\frac{\mathrm{d}F}{\mathrm{d}t} +\frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial F}{\partial x^i}\\
&=-\frac{\partial H'}{\partial x^i}.\end{aligned}\]
<p>Then Hamilton’s equations are preserved under the gauge transformation,
and so the dynamics of the system is the same independent of the chosen
gauge.</p>
<h2 id="finding-the-unitary-transformation">Finding the unitary transformation</h2>
<p>Suppose that $\psi$ is a solution to the Schrödinger equation</p>
\[i\hslash \frac{\partial \psi}{\partial t} = \frac{1}{2m}\left(\mathbf{p}-\frac{e}{c}\mathbf{A}(\mathbf{x},t)\right)^2\psi + e\varphi(\mathbf{x},t)\psi,\]
<p>and suppose that there is a unitary transformation $U(\Lambda)$ such
that $\psi’ = U(\Lambda)\psi$ satisfies the Schrödinger equation with
the transformed potentials:</p>
\[i\hslash \frac{\partial \psi'}{\partial t} = \frac{1}{2m}\left(\mathbf{p}'-\frac{e}{c}\mathbf{A}'(\mathbf{x},t)\right)^2\psi' + e\varphi'(\mathbf{x},t)\psi'.\]
<p>Since $U(\Lambda)$ is unitary, it is a general fact<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> that it can be
written as
\(U(\Lambda) = \exp(iG(\Lambda))\)
for some Hermitian
$G(\Lambda)$. In general, $G$ is going to be a function only of
$\mathbf{x}$ and $t$, since it depends only on $\Lambda$. We want to
find $G$.</p>
<p>Let’s split the Schrödinger equation with transformed potentials into
little bits. On the left-hand side, we have</p>
\[i\hslash \frac{\partial \psi'}{\partial t} = i\hslash\frac{\partial }{\partial t}(\exp(iG)\psi) = -\hslash\frac{\partial G}{\partial t}\exp(iG)\psi +i\hslash\exp(iG)\frac{\partial \psi}{\partial t}.\]
<p>On the right-hand side, we use the fact that</p>
\[\mathbf{p}U = U\mathbf{p} -i\hslash\nabla U = \exp(iG)\mathbf{p} +\hslash\nabla G \exp(iG),\]
<p>so</p>
\[\left(\mathbf{p}'-\frac{e}{c}\mathbf{A}'\right)^2\psi' = \left(\mathbf{p}'-\frac{e}{c}\mathbf{A}'\right)^2\exp(iG)\psi = \exp(iG)\left(\mathbf{p}'-\frac{e}{c}\mathbf{A}' +\hslash\nabla G\right)^2\psi.\]
<p>If we write $\mathbf{A}’=\mathbf{A}+\nabla\Lambda$,
$\varphi’ = \varphi -\frac{1}{c}\partial_t\Lambda$, and
$\mathbf{p’}=\mathbf{p}$, plug everything back in and reorder a little
bit, we obtain</p>
\[\exp(iG)\left(i\hslash\frac{\partial \psi}{\partial t}\right) = \exp(iG)\left(\mathbf{p}-\frac{e}{c}\mathbf{A}-\frac{e}{c}\nabla \Lambda + \hslash\nabla G\right)^2\psi +\exp(iG)\varphi\psi + \exp(iG)\left(\hslash\frac{\partial G}{\partial t}-\frac{e}{c}\frac{\partial \Lambda}{\partial t}\right)\psi.\]
<p>This equation looks like the Schrödinger equation for $\psi$, but with
an $\exp(iG)$ in front and a bunch of other things that we want to get
rid of. We would easily get rid of them if</p>
\[\begin{aligned}
\hslash\nabla G &= \frac{e}{c}\nabla\Lambda\\
\hslash\frac{\partial G}{\partial t} &= \frac{e}{c}\frac{\partial \Lambda}{\partial t}.\end{aligned}\]
<p>This is a differential equation for $G$, which has an easy solution<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>:</p>
<p>\(G = \frac{e}{\hslash c}\Lambda.\)
Therefore, if we choose the unitary
transformation to be</p>
<p>\(U(\Lambda) = \exp\left(\frac{ie}{\hslash c}\right),\)
then
$\psi’ = U(\Lambda)\psi$ is a solution to the Schrödinger equation with
the potentials $\mathbf{A}’$, $\varphi’$ whenever $\psi$ is a solution
to the equation with potentials $\mathbf{A}$, $\varphi$.</p>
<h1 id="references">References</h1>
<ul>
<li>
<p>Landau, L. D. and Lifschitz, E. M. (1977). <em>Quantum Mechanics:
Non-relativistic theory</em>. Chapter XV.</p>
</li>
<li>
<p>Sakurai, J.J. and Napolitano, J. (2011). <em>Modern Quantum Mechanics,
Second Edition</em>, Section 2.7.</p>
</li>
</ul>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>We can obtain
$|\left\langle \phi \middle\vert \psi\right\rangle|^2$ as the
expectation value of the projection operator
$\mathrm{pr}_{\psi}=\vert \psi \rangle\left\langle \psi \right\vert$
in the $\phi$ state. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>This is in stark contrast from the Lagrangian case, where
canonical momentum changes as the potentials change. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>See references. This is relatively easy to show in the
finite-dimensional case, but quite non-trivial for general Hilbert
spaces! <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Which is not unique, but that doesn’t matter. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Santiago Quintero de los RíosWhat happens if we add a particle to an electromagnetic field? What does it look like in the quantum case? What role do the potentials play in the evolution of the particle?Lectures are obsolete.2020-02-01T00:00:00+01:002020-02-01T00:00:00+01:00http://homotopico.com/2020/02/01/lectures-are-obsolete<p>Hear me out: lectures are obsolete and should be phased in favor of a more active approach to learning.</p>
<p>I’ve been taking lectures for quite a few years now. As the topics have become more advanced, I’ve started to notice diminishing returns with how much I <em>learn</em> during a lecture. I’ve noticed that I have to review almost everything from scratch, using my lecture notes only as a list of topics that were covered during lectures.</p>
<p>Why is it that I don’t get anything from lectures anymore? The most obvious reason is pacing. Most lectures are too fast, and I can’t catch up and get lost. A few are too slow and therefore boring and a waste of time. The second case doesn’t happen too often, so let’s focus on the first one.</p>
<p>When I’m in a lecture, I try to keep up with what the teacher says, mostly by writing things down. I try to see that everything makes sense and that the pieces of the math puzzle fit together. Often, I have a little problem that requires a bit of time, so I have three choices:</p>
<ol>
<li>Ignore the problem and keep up with the lecture.</li>
<li>Try to figure it out and possibly fall behind.</li>
<li>Interrupt the lecture to ask a question and risk public humiliation.</li>
</ol>
<p>If I ignore the problem, then I start losing grasp of the lecture, and new things become harder to digest. If I try to figure it out, I may fall behind, and new things become harder to digest. And most of the times these are tiny little problems that require just two or three minutes of thought, so I don’t dare interrupt the lecture. This last point is important, and I’ll come back to it.</p>
<p>The result is that I lose grasp of the lecture, and by the end of it I’m tired, hungry, and if I’m lucky I remember just the simplest parts of what was taught. When I go home and try to solve exercises I realize that I have a lot of gaps in my understanding, so I sit down and review. Now my memories of the lecture are useless, my notes get worse as the lecture progresses, so I fall back to the standard (and non-standard) literature. Then it’s almost as if I had never gone to the lecture. In fact, huh, some of these books explain things <em>better</em> than the lectures did. Some of these books have more examples, more solved problems! In fact, this teacher wrote his lecture notes, and his lectures say basically exactly the same as the notes!</p>
<p>So if the books explain things better and I have to review everything again anyway…</p>
<p>…what <em>is</em> the point of going to lectures?</p>
<p>The best lectures I’ve had were ones with small class sizes, where I felt comfortable asking questions and felt like I was adding to a discussion. These were lectures where I felt more like a peer than a pupil, where I felt that there were no barriers between me and the teacher.</p>
<p>That is it. The point of going to lectures is the interaction with the teacher. Talking to a human being who <em>has been there</em> and can try to figure out what your problems are. Getting feedback.</p>
<p>The problem is that old-fashioned lectures don’t incite interactions between the teacher and the students. Even worse, most <em>inhibit</em> them. I’ve had lectures with teachers shutting down students’ questions or answering vaguely (“oh well it’s <em>trivial</em>, isn’t it”). I’ve had lectures where the teacher is giving their back to the class, writing on the blackboard, talking <em>towards</em> the blackboard. In a few of these, the poor souls that dared raise their hands to ask a question had to lower them a few minutes later because they were never noticed.</p>
<p>In general, lectures are just uninviting towards discussions. They are reduced to tepid readings of books or lecture notes. Interactions with students are mostly small questions, a huge chunk of which are of the “there should be a two there, right?” type. Students don’t dare ask the big “I don’t understand” questions because they don’t want to risk feeling stupid in front of their peers.</p>
<p>If there is no interaction between teacher and students, and if the topics of the lectures are readily available somewhere else, then there is almost no point in attending lectures. In most cases, watching a video of a lecture
is just as good as attending one. In fact, it is arguably <em>better</em> because with a video you can go back and forward as you please.</p>
<p>Part of this problem can be solved if students read the material of the lecture beforehand. They will have fewer of the smaller questions, and the larger ones that arise will probably be more important and more likely to start a discussion. If students get acquainted with the topic, they will have more confidence during the lecture, and so they will be more likely to interact with the teacher and the rest of the class.</p>
<p>However, in our traditional model, there is little pretense that students will do <em>any</em> reading at all beforehand. Students expect to be taught everything during the lectures, and teachers essentially conform to that expectation. Since most students (including yours truly) won’t do anything other than what they <em>have</em> to do, there is no incentive for them to prepare anything.</p>
<p>Teachers should <em>actively</em> seek that their students interact with the class, instead of being book-readers that hope that students ask questions. The teaching method should <em>guarantee</em> (not just <em>permit</em>) that students interact with the teacher and each other too. The classroom should be a place of discussion, not of real-time copying of notes from the teacher’s papers to the blackboard to the students’ notebooks. Students learn by <em>doing</em>, not just by listening. Lectures don’t let students <em>do</em>.</p>
<p>We’ve just started the third decade of the 21st century. It is time to let the 19th-century style lecture die. There are <a href="https://en.wikipedia.org/wiki/Active_learning">many different <em>modern</em> approaches</a> to teaching that actually take into account <em>the way humans learn</em>. We can do better than this.</p>Santiago Quintero de los RíosI'm so tired of sitting for two (even three!) hours in a lecture hall, paying attention, trying to keep up, only to have to review absolutely everything at home. Can we do better?What is a gauge field? Part 1: Electromagnetism2019-09-06T00:00:00+02:002019-09-06T00:00:00+02:00http://homotopico.com/gauge-fields/2019/09/06/gauge-fields-01<p>Get this post on <a href="/assets/docs/pdf_posts/gauge-fields-01.pdf">pdf here</a>.</p>
<p>The objective of the following posts is to attempt to give an answer to
the age-old question: What is a gauge field?</p>
<p>That’s a tall order, alright. In order to understand what gauge fields
are, I hope to construct a direct <em>dictionary</em> between classical gauge
fields as physicists know them, and the language of principal bundles
that mathematicians use. The parallel between both is striking, but I
haven’t found an actual dictionary that lets you go straight from one to
the other. And well, since I’m a completionist it doesn’t just suffice
to spell it out, but rather to build it nicely.</p>
<p>This first part does not have too many prerrequisites: only the basics
of electromagnetism. I’ll try to be as self-contained as possible in the
physics part. Without further ado, let’s begin.</p>
<h1 id="potentials-for-the-electric-and-magnetic-field">Potentials for the electric and magnetic field</h1>
<p>In Gaussian units, the microscopic (or vacuum) Maxwell equations are</p>
\[\begin{aligned}
\boldsymbol{\nabla}\times\mathbf{E}+\frac{1}{c}\frac{\partial \mathbf{B}}{\partial t} &= 0 & \boldsymbol{\nabla}\cdot\mathbf{E} &= 4\pi\rho\\
\boldsymbol{\nabla}\cdot\mathbf{B} &= 0 & \boldsymbol{\nabla}\times\mathbf{B}-\frac{1}{c}\frac{\partial \mathbf{E}}{\partial t} &= \frac{4\pi}{c}\mathbf{J},\end{aligned}\]
<p>where $\rho$ is the electric charge density and $\mathbf{J}$ is the
electric current density. Here, the fields $\mathbf{E},\mathbf{B}$, and
the current density $\mathbf{J}$ are time-dependent vector fields on
some open subset of $U\subseteq\mathbb{R}^3$,</p>
\[\mathbf{E},\mathbf{B},\mathbf{J}:\mathbb{R}\times U \to \mathbb{R}^3,\]
<p>and the charge density $\rho$ is a time-dependent scalar function on
$U$,</p>
\[\rho:\mathbb{R}\times U\to \mathbb{R}.\]
<p>The objective with these
equations is to determine the electric and magnetic fields
$\mathbf{E},\mathbf{B}$, given the source functions $\mathbf{J}$ and
$\rho$ (and boundary conditions and all that so that the PDE is actually
soluble).</p>
<p>Now we have a <em>trick</em> to make it easier to find a solution to the
equations. The trick is to see that we can automatically satisfy the
homogeneous equations (on the left) by a clever rewriting of
$\mathbf{E}$ and $\mathbf{B}$. Indeed, the equation
$\boldsymbol{\nabla}\cdot\mathbf{B}=0$ suggests (but does not
<em>imply</em><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> ) that we write</p>
\[\mathbf{B} = \boldsymbol{\nabla}\times\mathbf{A},\]
<p>for some other
vector field $\mathbf{A}$, which we call the <strong>magnetic vector
potential</strong>. Once we have this, the other homogeneous equation becomes</p>
\[\boldsymbol{\nabla}\times\mathbf{E}+\frac{1}{c}\frac{\partial }{\partial t}(\boldsymbol{\nabla}\times\mathbf{A}) = \boldsymbol{\nabla}\times\left(\mathbf{E}+\frac{1}{c}\frac{\partial \mathbf{A}}{\partial t}\right)=0.\]
<p>Again, this suggests (but not always implies<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>) that we write</p>
\[\mathbf{E}+\frac{1}{c}\frac{\partial \mathbf{A}}{\partial t} = -\boldsymbol{\nabla}\phi,\]
<p>or rather</p>
\[\mathbf{E} = -\boldsymbol{\nabla}\phi -\frac{1}{c}\frac{\partial \mathbf{A}}{\partial t}\]
<p>for some function $\phi$ which we call the <strong>electric potential</strong> (the
negative sign is a convention<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>). For such choices of $\mathbf{A}$ and
$\phi$, the homogeneous Maxwell equations are immediately satisfied. Of
course the choice of $\mathbf{A}$ and $\phi$ must be such that the
inhomogeneous equations are still satisfied, but this reduces the
problem from finding two vector fields $\mathbf{E}$, $\mathbf{B}$
satisfying the full Maxwell equations to finding one scalar field $\phi$
and a vector field $\mathbf{A}$ that satisfy the (admittedly ugly)
equations</p>
\[\begin{aligned}
-\boldsymbol{\nabla}^2\phi - \frac{1}{c}\frac{\partial }{\partial t}(\boldsymbol{\nabla}\cdot\mathbf{A}) &= 4\pi\rho\\
\boldsymbol{\nabla}\times(\boldsymbol{\nabla}\times\mathbf{A}) +\frac{1}{c^2}\frac{\partial ^2\mathbf{A}}{\partial t^2} + \frac{1}{c}\frac{\partial }{\partial t}(\boldsymbol{\nabla}\phi) &= \frac{4\pi}{c}\mathbf{J}.\end{aligned}\]
<p>Of course, the choice of $\mathbf{A}$ and $\phi$ are not <em>unique</em>. Once
we have a choice of $\mathbf{A}$ and $\phi$, then for <em>any</em> smooth
scalar field $\Lambda$, we can change $\mathbf{A}$ as</p>
\[\mathbf{A}' = \mathbf{A} + \boldsymbol{\nabla}\Lambda,\]
<p>and of course
we will still obtain</p>
\[\boldsymbol{\nabla}\times\mathbf{A}' = \boldsymbol{\nabla}\times\mathbf{A} + \boldsymbol{\nabla}\times(\boldsymbol{\nabla}\Lambda) = \boldsymbol{\nabla}\times\mathbf{A} = \mathbf{B}.\]
<p>Then $\mathbf{A}’=\mathbf{A}+\boldsymbol{\nabla}\Lambda$ is also another
magnetic vector potential for $\mathbf{B}$. Under this new magnetic
potential, we have for the electric field</p>
\[\mathbf{E} = -\boldsymbol{\nabla}\phi - \frac{1}{c}\frac{\partial \mathbf{A}}{\partial t}=-\boldsymbol{\nabla}\phi - \frac{1}{c}\frac{\partial \mathbf{A}'}{\partial t}+\frac{1}{c}\frac{\partial }{\partial t}(\boldsymbol{\nabla}\Lambda)= -\boldsymbol{\nabla}\left(\phi- \frac{1}{c}\frac{\partial \Lambda}{\partial t}\right) -\frac{1}{c}\frac{\partial \mathbf{A}'}{\partial t},\]
<p>and so if we define a “new” electric potential $\phi’$ as</p>
\[\phi' = \phi -\frac{1}{c}\frac{\partial \Lambda}{\partial t},\]
<p>we
still can write</p>
\[\mathbf{E} = -\boldsymbol{\nabla}\phi'-\frac{1}{c}\frac{\partial \mathbf{A}'}{\partial t}.\]
<p>This tells us that the pair $\mathbf{A}’,\phi’$ is another perfectly
good choice of potentials for $\mathbf{E}$ and $\mathbf{B}$.</p>
<p><strong>In summary</strong>, the homogeneous Maxwell equations suggest that we write
the electric and magnetic fields $\mathbf{E}$, $\mathbf{B}$ in terms of
the potentials $\mathbf{A}$ and $\phi$. Once we have done that, we can
reduce Maxwell’s equations on $\mathbf{E}$ and $\mathbf{B}$ to two
(hopefully easier) equations the potentials $\mathbf{A},\phi$. Once we
have found such potentials $\mathbf{A},\phi$, we can recover the
electric and magnetic fields as</p>
\[\begin{aligned}
\mathbf{B} &= \boldsymbol{\nabla}\times\mathbf{A},\\
\mathbf{E} &=-\boldsymbol{\nabla}\phi -\frac{1}{c}\frac{\partial \mathbf{A}}{\partial t}.\end{aligned}\]
<p>The choice of potentials $\mathbf{A},\phi$ is <em>not unique</em>, since for
any smooth scalar field $\Lambda$, we can define new potentials
$\mathbf{A}’,\phi’$ as</p>
\[\begin{aligned}
\mathbf{A}' &= A + \boldsymbol{\nabla}\Lambda,\\
\phi' &=\phi -\frac{1}{c}\frac{\partial \Lambda}{\partial t},\end{aligned}\]
<p>and we still obtain the same $\mathbf{E}$, $\mathbf{B}$. When we change
the potentials using a function $\Lambda$ (which remember, can be <em>any</em>
smooth function), we say that we are applying a <strong>gauge transformation</strong>
to the fields, and we say that $\Lambda$ is a <strong>gauge function</strong>. Of
course, since the fields $\mathbf{E}$ and $\mathbf{B}$ do not change
under such transformations, we say that they are <strong>gauge invariant</strong>.
This property is also often called a <em>local symmetry</em>, since we are
applying a “transformation” that does not change the fields (that’s why
it’s a <em>symmetry</em>), and this transformation can be done differently at
every point in space-time (since $\Lambda$ can be <em>any</em> smooth
function). That’s where the word <em>local</em> comes from.</p>
<h1 id="in-special-relativistic-notation">In special-relativistic notation</h1>
<p>Let’s go back to square one, and let’s rewrite this more neatly<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> in
the Minkowski spacetime of special relativity. The space we are working
in is $M=\mathbb{R}\times U\subseteq \mathbb{R}^4$, with global
coordinates</p>
\[x^0=ct,\qquad x^1=x,\qquad x^2=y,\qquad x^3=z,\]
<p>where
$c$ is the speed of light in your favorite units. We also have a metric
$\eta$ given in coordinates as</p>
\[\eta = \mathrm{d}{x^0}\otimes\mathrm{d}{x^0} - \sum_{i=1}^3\mathrm{d}{x^i}\otimes\mathrm{d}{x^i} = \eta_{\mu\nu}\mathrm{d}x^\mu \otimes\mathrm{d}x^{\nu}.\]
<p>Here we used Einstein’s notation, and we will follow the usual
conventions of raising and lowering indices for the isomorphism
$TM\cong T^*M$ induced by the metric<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>.</p>
<p>Now we (rather arbitrarily) define a $2$-form $F\in \Omega^2(M)$, called
the <strong>Faraday</strong> or <strong>electromagnetic tensor</strong> whose components with
respect to these coordinates are</p>
\[[F_{\mu\nu}] =
\begin{pmatrix}
0 & \mathbf{E}^1 & \mathbf{E}^2 & \mathbf{E}^3\\
-\mathbf{E}^1 & 0 & -\mathbf{B}^3 & \mathbf{B}^2\\
-\mathbf{E}^2 & \mathbf{B}^3 & 0 & -\mathbf{B}^1\\
-\mathbf{E}^3 & -\mathbf{B}^2 & \mathbf{B}^1 & 0
\end{pmatrix}.\]
<p>This definition, at the time, is quite arbitrary but
it can be shown<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup> that is it a <em>somewhat</em> natural construction that
pops up in Maxwell’s equations. A note on notation: Let’s think of the
bold symbols as overriding Einstein’s notation. This equation should be
seen literally, component-wise, e.g. $F_{01}=\mathbf{E}^1$, and of
course the Einstein notation doesn’t add up here. It doesn’t matter too
much at this point<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>.</p>
<p>A key feature of the electromagnetic tensor is that it is a <em>closed</em>
$2$-form, that is, its de Rham differential vanishes. The computation is
a bit tedious but we’ll give a few components just so that this is not
completely blind faith. Recall that the de Rham differential of a
$k$-form $\omega\in \Omega^k(M)$ is a $(k+1)$-form with components</p>
\[(\mathrm{d}\omega)_{\nu_1\dots\nu_k \mu}= k!(k+1)\partial_{[\mu}\omega_{\nu_1\dots\nu_k]}.\]
<p>The bracket stands for the total antisymmetrization of the indices in
it. Applying this to the Faraday tensor, we obtain</p>
\[(\mathrm{d}{F})_{012}=\frac{\partial F_{12}}{\partial x^0} -\frac{\partial F_{02}}{\partial x^1} +\frac{\partial F_{01}}{\partial x^2} = -\frac{1}{c}\frac{\partial \mathbf{B}^3}{\partial t} -\frac{\partial \mathbf{E}^2}{\partial x^1}+\frac{\partial \mathbf{E}^1}{\partial x^2} = -\left(\frac{1}{c}\frac{\partial \mathbf{B}}{\partial t} +\boldsymbol{\nabla}\times\mathbf{E}\right)^3,\]
<p>(that is, the third component of the equation, not the equation cubed)
and similarly for the components $(\mathrm{d}{F})_{013}$ and
$(\mathrm{d}{F})_{023}$. For the last component,</p>
\[(\mathrm{d}{F})_{123}=\frac{\partial F_{23}}{\partial x^1} -\frac{\partial F_{13}}{\partial x^2} +\frac{\partial F_{12}}{\partial x^3} = -\frac{\partial \mathbf{B}^1}{\partial x^1}-\frac{\partial \mathbf{B}^2}{\partial x^2}-\frac{\partial \mathbf{B}^3}{\partial x^3} = -\boldsymbol{\nabla}\cdot\mathbf{B}.\]
<p>If we compute the other two components we will see that the components
of $\mathrm{d}{F}$ are precisely the components of the homogeneous
Maxwell equations. Therefore, we have that</p>
\[\mathrm{d}{F}=0\qquad\Leftrightarrow\qquad
\begin{aligned}
\boldsymbol{\nabla}\times\mathbf{E}+\frac{1}{c}\frac{\partial \mathbf{B}}{\partial t} &= 0\\
\boldsymbol{\nabla}\cdot\mathbf{B} &= 0
\end{aligned}.\]
<p>Thus, if we assume that Maxwell’s equations hold,
then $F$ is a closed $2$-form, and this suggests<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup> we write</p>
\[F = \mathrm{d}{A}\]
<p>for some $1$-form $A\in \Omega^1(M)$, called the
<strong>electromagnetic potential</strong>. In components, this is</p>
\[F_{\mu\nu} = \frac{\partial A_\nu}{\partial x^\mu}-\frac{\partial A_\mu}{\partial x^\nu},\]
<p>where $A = A_\mu\mathrm{d}{x^\mu}$. This electromagnetic potential
corresponds to the electric and magnetic potentials $\phi,\mathbf{A}$ as</p>
\[A_0 = \phi \qquad A_i = -\mathbf{A}^i.\]
<p>The annoying sign for the
spacial indices tell us that $A$ should be more naturally thought of as
a <em>vector field</em> instead of a $1$-form. Raise that index!<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup></p>
\[A^0 = \phi \qquad A^i = \mathbf{A}^i\]
<p>Ah, much better. Indeed, we
have for $i\geq 1$,</p>
\[F_{0i}=\mathbf{E}^i=\frac{\partial A_i}{\partial x^0}-\frac{\partial A_0}{\partial x^i} = \left(-\frac{1}{c}\frac{\partial \mathbf{A}}{\partial t}-\boldsymbol{\nabla}\phi \right)^i,\]
<p>and for instance,</p>
\[F_{21}=\mathbf{B}^3 = \frac{\partial A_1}{\partial x^2}-\frac{\partial A_2}{\partial x^1}=\left(\boldsymbol{\nabla}\times\mathbf{A}\right)^3.\]
<p>This tells us that the choice of a primitive $A$ for $F$ such that
$\mathrm{d}A = F$ is exactly the same as choosing potentials
$\phi,\mathbf{A}$ for the electric and magnetic fields $\mathbf{E}$,
$\mathbf{B}$ as in the previous section.</p>
<p>Once again, we have that the choice of electromagnetic potential is not
unique, since we can add to $A$ any closed $1$-form
$\mathrm{d}{\Lambda}$ (for a function $\Lambda\in C^{\infty}(M)$) and
still obtain the same electromagnetic tensor $F$. If
$A’=A-\mathrm{d}{\Lambda}$, then</p>
\[\mathrm{d}{A'}=\mathrm{d}(A-\mathrm{d}{\Lambda})=\mathrm{d}{A} +\mathrm{d}^2\Lambda = \mathrm{d}{A} =F.\]
<p>In components, $A’$ looks like</p>
\[\begin{aligned}
A'_0&=\phi'=A_0-\frac{\partial \Lambda}{\partial x^0}=\phi -\frac{1}{c}\frac{\partial \Lambda}{\partial t},\\
A'_i&={-\mathbf{A}'}^i = A_i-\frac{\partial \Lambda}{\partial x^i}=-(\mathbf{A} +\boldsymbol{\nabla}\Lambda)^i.\end{aligned}\]
<p>Thus we recover the same equations for a gauge transformation:</p>
\[A' = A - \mathrm{d}\Lambda \qquad \Leftrightarrow \qquad \begin{aligned}
\mathbf{A}' &= \mathbf{A} + \boldsymbol{\nabla}\Lambda,\\
\phi' &=\phi -\frac{1}{c}\frac{\partial \Lambda}{\partial t}
\end{aligned}\]
<p>Again, to summarize, we can put together the electric and magnetic
fields into a $2$-form $F$, the electromagnetic tensor, which satisfies</p>
\[\mathrm{d}{F}=0.\]
<p>This equation is automatically satisfied if there
is a $1$-form $A$ such that $F=\mathrm{d}{A}$. In this case we call $A$
an electromagnetic potential for $F$. The choice of potential is not
unique, for we can add any closed $1$-form $\mathrm{d}{\Lambda}$ to $A$
and obtain the same electromagnetic tensor. The new electromagnetic
potential is, then</p>
\[A' = A - \mathrm{d}{\Lambda}.\]
<p>This is called a
<strong>gauge transformation</strong>, and the ability to change the potentials is
called a <strong>gauge freedom</strong> (or symmetry).</p>
<p>What about the inhomogeneous Maxwell equations? Well, that’s a little
bit more tricky. Let’s write all the equations again:</p>
\[\begin{aligned}
\boldsymbol{\nabla}\times\mathbf{E}+\frac{1}{c}\frac{\partial \mathbf{B}}{\partial t} &= 0 & \boldsymbol{\nabla}\times\mathbf{B}-\frac{1}{c}\frac{\partial \mathbf{E}}{\partial t} &= \frac{4\pi}{c}\mathbf{J}\\
\boldsymbol{\nabla}\cdot\mathbf{B} &= 0 & \boldsymbol{\nabla}\cdot\mathbf{E} &= 4\pi\rho.\end{aligned}\]
<p>Note that in the inhomogeneous equations, the roles of $\mathbf{E}$ and
$\mathbf{B}$ seem to be reversed, except for a sneaky negative sign.
What is this inhomogeneous equation in terms of the electromagnetic
tensor $F$?</p>
<p>If you’ve already seen this then: 1. why are you even reading this post
and 2. you already know that the inhomogeneous equations, <em>in
components</em>, take the form</p>
\[\frac{\partial F^{\mu\nu}}{\partial x^{\mu}}=4\pi J^\nu,\]
<p>where $J$
is a vector field whose components are</p>
\[J^0 = \rho; \qquad J^i = \frac{1}{c}\mathbf{J}^i,~~\text{for }i\geq 1.\]
<p>Okay this is good and all, but we want a coordinate-free way to write
this. Let’s try to reverse-engineer the equation. We have that $F$ is a
$2$-form, and we want to relate it via some sort of “divergence” with a
<em>vector field</em>. Instead of that, we can simply convert the vector field
$J$ into a $1$-form using the metric, but still we need to take a
derivative of $F$. The problem is that the de Rham differential
$\mathrm{d}$ will annihilate $F$, and even if it didn’t, it would turn
$F$ into a $3$-form. No bueno!</p>
<p>Instead we want to find a way to switch the roles of $\mathbf{E}$ and
$\mathbf{B}$ in the Faraday tensor, so that the resulting tensor does
not vanish when we apply the exterior differential. We would then have a
three-form, which we want to somehow relate to the current one-form.</p>
<p>If this sounds familiar to you, then you’ve probably heard of the
<strong>Hodge dual</strong> or Hodge star operator. Briefly, if you have a metric $g$
on a manifold $M$ then there is an isomorphism
$\star:\Omega^{k}(M)\to\Omega^{n-k}(M)$, such that for all $k$-forms
$\omega,\nu$</p>
\[\alpha\wedge\star\beta = g(\alpha,\beta)\mathrm{vol},\]
<p>where $\mathrm{vol}$ is the volume form associated to the metric and the
metric evaluated on $k$-forms is defined as</p>
\[g(\alpha,\beta) := \frac{1}{k!}\alpha^{\mu_1\dots\mu_k}\beta_{\mu_1\dots\mu_k}.\]
<p>We’ve discussed the Hodge star in depth in a <a href="https://www.homotopico.com/2019/06/10/hodge-star.html">previous
post.</a> It can be
shown that, in coordinates, the components of the Hodge star of a
$k$-form $\beta$ are</p>
\[(\star\beta)_{\lambda_1\dots\lambda_{n-k}}= \pm\frac{\sqrt{|\det(g)|}}{k!}\beta^{\rho_1\dots\rho_k}\epsilon_{\rho_1\dots\rho_k\lambda_1\dots\lambda_{n-k}},\]
<p>where $\epsilon$ is the Levi-Civita symbol. In particular, we will care
about the stars of wedges of the basis one-forms $\mathrm{d}{x}^\mu$. In
the <a href="https://www.homotopico.com/2019/06/10/hodge-star.html">previous
post</a> we showed
that if $\{e^1,\dots,e^n\}$ is an orthonormal basis then</p>
\[\star(e^{\rho_1}\wedge\dots\wedge e^{\rho_k})=g^{\rho_1\rho_1}\dots g^{\rho_k\rho_k}\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}}e^{\nu_1}\wedge\dots\wedge e^{\nu_{n-k}}\qquad\text{(no Einstein sum)},\]
<p>where $\{\nu_1\dots\nu_{n-k}\}$ is the complement of
$\{\rho_1,\dots,\rho_k\}$ in $\{0,\dots,n-1\}$. In
our case, $k=2$ and $n=4$ and the one-forms $\mathrm{d}{x}^\mu$ are
orthonormal, so that</p>
\[\star(\mathrm{d}{x}^0\wedge\mathrm{d}{x}^1)=\eta^{00}\eta^{11}\epsilon^{0123}\mathrm{d}{x}^2\wedge\mathrm{d}{x}^3 = -\mathrm{d}{x}^2\wedge\mathrm{d}{x}^3.\]
<p>In a similar fashion, we can show that</p>
\[\begin{aligned}
\star(\mathrm{d}{x}^0\wedge\mathrm{d}{x}^2)&=\mathrm{d}{x^1}\wedge\mathrm{d}{x}^3\\
\star(\mathrm{d}{x}^0\wedge\mathrm{d}{x}^3)&=-\mathrm{d}{x^1}\wedge\mathrm{d}{x}^2\\
\star(\mathrm{d}{x}^1\wedge\mathrm{d}{x}^2)&=\mathrm{d}{x^0}\wedge\mathrm{d}{x}^3\\
\star(\mathrm{d}{x}^2\wedge\mathrm{d}{x}^3)&=\mathrm{d}{x^0}\wedge\mathrm{d}{x}^1\\
\star(\mathrm{d}{x}^3\wedge\mathrm{d}{x}^1)&=\mathrm{d}{x^0}\wedge\mathrm{d}{x}^2.\end{aligned}\]
<p>Thus, if we rewrite the Faraday tensor as</p>
\[\begin{aligned}
F &= \mathbf{E}^1\mathrm{d}{x}^0\wedge\mathrm{d}{x}^1+\mathbf{E}^2\mathrm{d}{x}^0\wedge\mathrm{d}{x}^2+\mathbf{E}^3\mathrm{d}{x}^0\wedge\mathrm{d}{x}^3\\
&\phantom{=}-\mathbf{B}^1\mathrm{d}{x}^2\wedge\mathrm{d}{x}^3-\mathbf{B}^2\mathrm{d}{x}^3\wedge\mathrm{d}{x}^1-\mathbf{B}^3\mathrm{d}{x}^1\wedge\mathrm{d}{x}^2,\end{aligned}\]
<p>we obtain</p>
\[\begin{aligned}
\star F&=-\mathbf{E}^1\mathrm{d}{x}^2\wedge\mathrm{d}{x}^3 +\mathbf{E}^2\mathrm{d}{x}^1\wedge\mathrm{d}{x}^3 - \mathbf{E}^3\mathrm{d}{x}^1\wedge\mathrm{d}{x}^2\\
&\phantom{=}-\mathbf{B}^1\mathrm{d}{x}^0\wedge\mathrm{d}{x}^1 - \mathbf{B}^2\mathrm{d}{x}^0\wedge\mathrm{d}{x}^2 - \mathbf{B}^3\mathrm{d}{x}^0\wedge\mathrm{d}{x}^3,\end{aligned}\]
<p>which in matrix form is</p>
\[[(\star F)_{\mu\nu}] =
\begin{pmatrix}
0 & -\mathbf{B}^1 & -\mathbf{B}^2 & -\mathbf{B}^3\\
\mathbf{B}^1 & 0 & -\mathbf{E}^3 & \mathbf{E}^2\\
\mathbf{B}^2 & \mathbf{E}^3 & 0 & -\mathbf{E}^1\\
\mathbf{B}^3 & -\mathbf{E}^2 & \mathbf{E}^1 & 0
\end{pmatrix}.\]
<p>Thus, we have that the roles of $\mathbf{B}$ and
$\mathbf{E}$ in $\star F$ are reversed from those in $F$, with a sneaky
negative sign. Morally, applying the $\star$ operator gives</p>
\[\begin{aligned}
F &\overset{\star}{\mapsto} \star F\\
\mathbf{B}&\mapsto \mathbf{E}\\
\mathbf{E}&\mapsto -\mathbf{B}.\end{aligned}\]
<p>Now we have that
$\mathrm{d}(\star F)$ is a <em>three</em>-form, some of whose components are</p>
\[(\mathrm{d}\star{F})_{012}=\frac{\partial }{\partial x^0}(\star F)_{12} -\frac{\partial }{\partial x^1}(\star F)_{02} +\frac{\partial }{\partial x^2}(\star F)_{01} = -\frac{1}{c}\frac{\partial \mathbf{E}^3}{\partial t} +\frac{\partial \mathbf{B}^2}{\partial x^1}-\frac{\partial \mathbf{B}^1}{\partial x^2} = \left(-\frac{1}{c}\frac{\partial \mathbf{E}}{\partial t} +\boldsymbol{\nabla}\times\mathbf{B}\right)^3,\]
<p>which is</p>
\[(\mathrm{d}\star F)_{012} = \left(-\frac{1}{c}\frac{\partial \mathbf{E}}{\partial t} +\boldsymbol{\nabla}\times\mathbf{B}\right)^3 = \frac{4\pi}{c}\mathbf{J}^3.\]
<p>Similarly,</p>
\[(\mathrm{d}\star{F})_{123}=\frac{\partial }{\partial x^1}(\star F)_{23} -\frac{\partial }{\partial x^2}(\star F)_{13} +\frac{\partial }{\partial x^3}(\star F)_{12} = -\frac{\partial \mathbf{E}^1}{\partial x^1}-\frac{\partial \mathbf{E}^2}{\partial x^2}-\frac{\partial \mathbf{E}^3}{\partial x^3} = -\boldsymbol{\nabla}\cdot\mathbf{E} = -4\pi \rho.\]
<p>If we put it all together, we obtain</p>
\[\begin{aligned}
\mathrm{d}\star F &= \left(-\frac{1}{c}\frac{\partial \mathbf{E}}{\partial t} +\boldsymbol{\nabla}\times\mathbf{B}\right)^3\mathrm{d}{x}^0\wedge\mathrm{d}{x}^1\wedge\mathrm{d}{x}^2 + \left(-\frac{1}{c}\frac{\partial \mathbf{E}}{\partial t} +\boldsymbol{\nabla}\times\mathbf{B}\right)^1\mathrm{d}{x}^0\wedge\mathrm{d}{x}^2\wedge\mathrm{d}{x}^3\\
&\phantom{=} + \left(-\frac{1}{c}\frac{\partial \mathbf{E}}{\partial t} +\boldsymbol{\nabla}\times\mathbf{B}\right)^2\mathrm{d}{x}^0\wedge\mathrm{d}{x}^3\wedge\mathrm{d}{x}^1 - (\boldsymbol{\nabla}\cdot{\mathbf{E}})\mathrm{d}{x}^1\wedge\mathrm{d}{x}^2\wedge\mathrm{d}{x}^3\\
&= \frac{4\pi}{c}\left(\mathbf{J}^3\mathrm{d}{x}^0\wedge\mathrm{d}{x}^1\wedge\mathrm{d}{x}^2 + \mathbf{J}^2\mathrm{d}{x}^0\wedge\mathrm{d}{x}^3\wedge\mathrm{d}{x}^1 + \mathbf{J}^1\mathrm{d}{x}^0\wedge\mathrm{d}{x}^2\wedge\mathrm{d}{x}^3\right) - 4\pi\rho\mathrm{d}{x}^1\wedge\mathrm{d}{x}^2\wedge\mathrm{d}{x}^3\end{aligned}\]
<p>We see that on the right-hand side we have the components of the
$4$-current $J$, but as a <em>three</em>-form. If we let $j$ be the one-form
with components $J_\mu$ (i.e., with the lowered index), we have</p>
\[j = \rho\mathrm{d}{x}^0 - \frac{1}{c}\left(\mathbf{J}^1\mathrm{d}{x}^1 + \mathbf{J}^2\mathrm{d}{x}^2 +\mathbf{J}^3\mathrm{d}{x}^3\right),\]
<p>so that</p>
\[\star j = \rho\mathrm{d}{x}^1\wedge\mathrm{d}{x}^2\wedge\mathrm{d}{x}^3-\frac{1}{c}\left(\mathbf{J}^3\mathrm{d}{x}^0\wedge\mathrm{d}{x}^1\wedge\mathrm{d}{x}^2 + \mathbf{J}^2\mathrm{d}{x}^0\wedge\mathrm{d}{x}^3\wedge\mathrm{d}{x}^1 + \mathbf{J}^1\mathrm{d}{x}^0\wedge\mathrm{d}{x}^2\wedge\mathrm{d}{x}^3\right).\]
<p>Thus, we identify:</p>
\[\mathrm{d}\star F = -4\pi \star j.\]
<p>This can also
be written as</p>
\[\star\mathrm{d}\star F = 4\pi j.\]
<p>Finally, we obtain
the Maxwell equations written in a coordinate-free way in terms of
differential forms:</p>
\[\begin{aligned}
\mathrm{d}F &= 0, & \star\mathrm{d}\star F &=4\pi j.\end{aligned}\]
<p>These are the <em>field equations</em> of the electromagnetic field. The
homogeneous equation $\mathrm{d}{F}=0$ talks about <em>conservation of
charge</em>, and the inhomogeneous equation
$\star\mathrm{d}\star F = 4\pi j$ tells how the electromagnetic field
$F$ responds to the presence of charges and currents (represented by
$j$). The form of these equations is typical of a gauge field, as we
shall see in future posts.</p>
<h1 id="takeaway-and-future">Takeaway and future</h1>
<p>We saw that the inhomogeneous Maxwell equations allow (in some cases,
depending on the topology of the underlying space) us to write the
electric and magnetic fields $\mathbf{E},\mathbf{B}$ in terms of simple,
auxiliary potentials $\phi,\mathbf{A}$. The choice of these potentials
is not unique, and the fields $\mathbf{E},\mathbf{B}$ remain invariant
under certain transformations of the potentials, called <em>gauge
transformations</em>. At this point, the potentials are no more than
auxiliary mathematical objects that help us solve Maxwell’s equations,
but we shall see that they can be attributed a physical interpretation
in the quantum case.</p>
<p>We also saw a unifying description of the electric and magnetic fields
into an electromagnetic tensor in $4D$ spacetime, and we rewrote
Maxwell’s equations in a neater form that is typical of gauge fields (as
we shall see in the future).</p>
<p>What comes next is adding <em>matter</em> to this whole issue: introducing
objects that can interact with the electromagnetic field. We will also
see a Lagrangian description of the field equations, which again will be
typical of gauge fields.</p>
<h1 id="references">References</h1>
<ul>
<li>
<p>Jackson, J. D. (1998). <em>Classical Electrodynamics, Third Edition</em>.
Chapter 6, explains it <em>way</em> better than I do.</p>
</li>
<li>
<p>Báez, J. C. and Muniain, J. P. (1994). <em>Gauge Fields, Knots, and
Gravity</em>, World Scientific. Section I.5.</p>
</li>
<li>
<p>Fecko, M. (2006). <em>Differential Geometry and Lie Groups for
Physicists</em>. Cambridge University Press.</p>
</li>
</ul>
<p>Thanks to Laura Arboleda for checking style and consistency <3</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>See the <a href="https://www.homotopico.com/2019/06/10/hodge-star.html">previous
post</a> on the
Hodge star. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>See the <a href="https://www.homotopico.com/2019/06/10/hodge-star.html">previous
post</a> on the
Hodge star. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>which does have a neat physical interpretation in terms of energy,
but which we shall not discuss. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>For our purposes, this rewriting is simply for the sake of making
everything clearer. What we are really doing is rewriting the
equations of electromagnetism in the language of special relativity.
It is no coincidence that this amounts just to a <em>rewriting</em> without
any modifications to the equations of electromagnetism: special
relativity was essentially <em>made to work</em> with classical
electromagnetism. See the end of Jackson (or any decent book on
relativity) for more details on special-relativistic
electromagnetism. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>For more details on this check any decent book on relativity, for
example Carroll, D’Inverno or Schutz. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>Uhh stay tuned for another post, I guess? <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>This can be fixed by introducing new $4$-vectors $E,B$ with
$E^0=0,E^i=\mathbf{E}^i$ (same for $B$), and then <em>lowering</em> the
index and <em>defining</em> $F_{0i}:=-E_i=E^i=\mathbf{E}^i$ (and
equivalently for $B$) but that’s too much work and potentially more
confusing. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>But does not imply! Again, this depends on $H^2(M)\cong H^2(U)$
being trivial. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>Index gymnastics with the Minkowski metric is quite easy: In my
convention (the one true convention, fight me) with positive time
and negative space, the time index remains the same while the space
indices gain a negative sign whenever they are raised or lowered (as
can be easily checked). <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Santiago Quintero de los RíosThis is the first post in a series that tries to discover what a gauge field is. The best place to begin is in the easiest gauge theory, namely electromagnetism.The Hodge star for people not quite in a hurry2019-06-10T00:00:00+02:002019-06-10T00:00:00+02:00http://homotopico.com/2019/06/10/hodge-star<p>Get this post in <a href="/assets/docs/pdf_posts/hodge-star.pdf">pdf format here</a>.</p>
<p>Suppose that you have a $k$-form $\alpha$ on a manifold $M$ of dimension
$n$. But you’re a spoiled brat and don’t like $k$-forms but rather
$(n-k)$-forms, so you go to your dad and throw a tantrum, and your dad
says shhh, $k$-forms are okay, see? $k$-forms and $(n-k)$-forms are
basically the same, you see, $\Omega^{k}(M)$ and $\Omega^{n-k}(M)$ even
have the same dimension, it’s basically the same thing sweetie stop
crying everyone’s staring at us please, but you won’t have it because
all your friends have $(n-k)$-forms and keep crying, and your dad
already spent a lot of money in your $k$-form and $(n-k)$-forms are so
expensive! What can your dad do now? He takes your dumb $k$-form to his
workshop and comes out three hours later with a shiny $(n-k)$-form, and
hands it to you smiling but he’s regretting having children, nay, having
<em>you</em> at this point. Oh wow dad, that’s perfect thank you so much you’re
the best dad, how did yo do it? Well I told you, $k$-forms and
$(n-k)$-forms are not that different, you just need a metric and some
patience and you can turn one into the other.</p>
<h1 id="metric-on-the-exterior-algebra">Metric on the exterior algebra</h1>
<p>Let $V$ be a finite-dimensional vector space of dimension $n$, and $g$ a
Lorentzian metric on $V$, i.e. a symmetric, non-degenerate bilinear map
$g:V\times V\to {\mathbb{R}}$. We can extend $g$ bilinearly to
$\Lambda^kV$ for any $k$ as</p>
\[g(u_1\wedge\dots\wedge u_k,w_1\wedge\dots\wedge w_k) = \det([g(u_i,w_j)]),\]
<p>where $u_1,\dots,u_k,w_1,\dots,w_k\in V$ and $[g(u_i,w_j)]$ is a matrix
whose $(i,j)$-th entry is $g(u_i,w_j)$. For example,</p>
\[g(u_1\wedge u_2,w_1\wedge w_2)=g(u_1,w_1)g(u_2,w_2)-g(u_1,w_2)g(u_2,w_1).\]
<p>Now let $\alpha,\beta \in \Lambda^k V$. With respect to some basis
$u^1,\dots,u^n$ of $V$ (not necessarily orthonormal), we can write</p>
\[\begin{aligned}
\alpha &= \frac{1}{k!}\alpha_{\mu_1\dots\mu_k}u^{\mu_1}\wedge\dots\wedge u^{\mu_k}\\
\beta &= \frac{1}{k!}\beta_{\mu_1\dots\mu_k}u^{\mu_1}\wedge\dots\wedge u^{\mu_k},\end{aligned}\]
<p>and thus we can find (in Einstein’s notation), writing
$g^{ij}=g(u^i,u^j)$ for the components of the metric in this basis,</p>
\[\begin{aligned}
g(\alpha,\beta)&= \frac{1}{(k!)^2} \alpha_{\mu_1\dots\mu_k}\beta_{\nu_1\dots\nu_k}\mathrm{det} ( [g(u^{\mu_i},u^{\nu_j})])\\
&= \frac{1}{(k!)^2} \alpha_{\mu_1\dots\mu_k}\beta_{\nu_1\dots\nu_k}\sum_{\sigma\in \mathfrak{S}_k}{\mathrm{sgn}}(\sigma)g^{\mu_1\nu_{\sigma(1)}}\dots g^{\mu_k\nu_{\sigma(k)}}\\
&= \frac{1}{(k!)^2}\sum_{\sigma\in \mathfrak{S}_k} \alpha_{\mu_1\dots\mu_k}\beta_{\nu_1\dots\nu_k}{\mathrm{sgn}}(\sigma)g^{\mu_1\nu_{\sigma(1)}}\dots g^{\mu_k\nu_{\sigma(k)}}\\
&=\frac{1}{(k!)^2}\sum_{\sigma\in \mathfrak{S}_k}{\mathrm{sgn}}(\sigma) \alpha^{\nu_{\sigma(1)}\dots\nu_{\sigma(k)}}\beta_{\nu_1\dots\nu_k}\\
&=\frac{1}{(k!)^2}\sum_{\sigma\in \mathfrak{S}_k}\alpha^{\nu_{1}\dots\nu_{k}}\beta_{\nu_1\dots\nu_k}\\
&=\frac{1}{k!}\alpha^{\nu_{1}\dots\nu_{k}}\beta_{\nu_1\dots\nu_k}.
\end{aligned}\]
<p>Here we used the fact that the components of a form are totally
antisymmetric, so for any permutation $\sigma\in \mathfrak{S}_k$</p>
\[\alpha_{\mu_{\sigma(1)}\dots\mu_{\sigma(k)}}={\mathrm{sgn}}(\sigma)\alpha_{\mu_1\dots\mu_k}.\]
<p>With this result we can see that
$g:\Lambda^kV\times\Lambda^kV\to {\mathbb{R}}$ is non-degenerate. Choose
an orthonormal basis $e^1,\dots,e^n$ of $V$. Then we have</p>
\[g(e^{\mu_1}\wedge\dots\wedge e^{\mu_k},e^{\nu_1}\wedge\dots\wedge e^{\nu_k}) = \sum_{\sigma\in \mathfrak{S}_k}{\mathrm{sgn}}(\sigma)g^{\mu_1\nu_{\sigma(1)}}\dots g^{\mu_k\nu_{\sigma(k)}}.\]
<p>However, since the basis is orthonormal then $g^{ii}=\pm 1$ and
$g^{ij}=0$ if $i\neq j$. From this we see that if
$\{\mu_1,\dots,\mu_k\}\neq \{\nu_1,\dots,\nu_k\}$,
then for <em>absolutely no</em> permutation $\sigma\in \mathfrak{S}_k$ we
will have $\mu_1=\nu_{\sigma(1)}$ and $\dots$ and
$\mu_k=\nu_{\sigma(k)}$. Thus the inner product is nonzero only if
$\{\mu_1,\dots,\mu_k\}=\{\nu_1,\dots,\nu_k\}$.
In this case, then, we have that $(\mu_1,\dots,\mu_k)$ is precisely a
permutation of $(\nu_1,\dots,\nu_k)$. Since all symbols
$\mu_1,\dots,\mu_k$ must be distinct (otherwise
$e^{\mu_1}\wedge\dots\wedge e^{\mu_k}$ is zero to begin with), we then
have that there is <em>only one permutation</em> that survives in the sum, the
one for which precisely $\nu_{\sigma(i)}=\mu_i$. In conclusion:</p>
\[g(e^{\mu_1}\wedge\dots\wedge e^{\mu_k},e^{\nu_1}\wedge\dots\wedge e^{\nu_k}) = \begin{cases} (-1)^s{\mathrm{sgn}}(\sigma) & \text{if there exists }\sigma\text{ such that }\nu_{\sigma(j)}=\mu_j\\
0 &\text{ otherwise}
\end{cases},\]
<p>where here $s$ is the number the number of elements in $\{e^{\mu_1},\dots,e^{\mu_k}\}$ which
have <em>negative</em> length. Since we know that the elements of the form
$e^{\mu_1}\wedge\dots\wedge e^{\mu_k}$ form a basis for $\Lambda^kV$,
the previous result tells us that in this basis the matrix of $g$ is
diagonal with entries $\pm 1$, and thus $g$ is non-degenerate.</p>
<h2 id="defining-the-hodge-star">Defining the Hodge star</h2>
<p>Now let ${\mathrm{vol}}\in \Lambda^nV$ be a volume form on $V$, given in
terms of an oriented orthonormal basis $e_1,\dots,e_n$ as</p>
\[{\mathrm{vol}}= e_1\wedge\dots\wedge e_n.\]
<p>We now define the <strong>Hodge
star operator</strong> $\star:\Lambda^kV\to \Lambda^{n-k}V$, as the <em>unique</em>
linear operator such that for all $\alpha,\beta\in \Lambda^{k}V$,</p>
\[\alpha\wedge\star\beta = g(\alpha,\beta){\mathrm{vol}}.\]
<p>Here we’ve sneakily claimed that such a linear operator <em>exists and is
unique</em>. We need to prove that. First, for each
$\beta\in \Lambda^{n-k}V$ define a map
$\phi_\beta:\Lambda^{k}V\to {\mathbb{R}}$ such that</p>
\[\alpha\wedge\beta =\phi_{\beta}(\alpha){\mathrm{vol}}.\]
<p>This map is
well-defined and clearly linear, i.e. $\phi_\beta\in (\Lambda^{k}V)^*$.
In particular, we can see that in components with respect to an
orthonormal basis $e^1,\dots,e^n$ of $V$,</p>
\[\alpha\wedge\beta = \frac{1}{k!(n-k)!}\alpha_{\mu_1\dots\mu_k}\beta_{\nu_1\dots\nu_{n-k}}\epsilon^{\mu_1\dots\mu_k\nu_1\dots\nu_{n-k}}{\mathrm{vol}},\]
<p>where $\epsilon^{\lambda_1\dots\lambda_n}$ is the Levi-Civita symbol, so
that</p>
\[\phi_\beta(\alpha) = \frac{1}{k!(n-k)!}\alpha_{\mu_1\dots\mu_k}\beta_{\nu_1\dots\nu_{n-k}}\epsilon^{\mu_1\dots\mu_k\nu_1\dots\nu_{n-k}}.\]
<p>Now let’s see that the assignment, let’s call it
$\phi:\Lambda^{n-k}V\to (\Lambda^kV)^*$, given as
$\beta\mapsto \phi_\beta$ is an isomorphism. First, it is clearly
linear. Now suppose that $\phi_\beta=0$, i.e. for all
$\alpha\in\Lambda^{k}V$, $\phi_\beta(\alpha)=0$. In particular, for
$\alpha=e^{\rho_1}\wedge\dots\wedge e^{\rho_k}$,</p>
\[0 = \phi_\beta(e^{\rho_1}\wedge\dots\wedge e^{\rho_k}) = \frac{1}{k!(n-k)!}(e^{\rho_1}\wedge\dots\wedge e^{\rho_k})_{\mu_1\dots\mu_k}\beta_{\nu_1\dots\nu_{n-k}}\epsilon^{\mu_1\dots\mu_k\nu_1\dots\nu_{n-k}}.\]
<p>Now we have that the components of the basis itself are</p>
\[(e^{\rho_1}\wedge\dots\wedge e^{\rho_k})_{\mu_1\dots\mu_k} = \begin{cases}
0 &\text{if }{\left\{\mu_1,\dots,\mu_k\right\}}\neq {\left\{\rho_1,\dots,\rho_k\right\}}\\
{\mathrm{sgn}}(\sigma) &\text{ if }\sigma\in \mathfrak{S}_k\text{ such that }\rho_i=\mu_{\sigma(i)}
\end{cases}.\]
<p>We denote the right-hand side as the following symbol:</p>
\[\delta^{\rho_1\dots\rho_k}_{\mu_1\dots\mu_k}:= \begin{cases}
0 &\text{if }{\left\{\mu_1,\dots,\mu_k\right\}}\neq {\left\{\rho_1,\dots,\rho_k\right\}}\\
{\mathrm{sgn}}(\sigma) &\text{ if }\sigma\in \mathfrak{S}_k\text{ such that }\rho_i=\mu_{\sigma(i)}
\end{cases}.\]
<p>A little bit of tedious work shows that</p>
\[\delta^{\rho_1\dots\rho_k}_{\mu_1\dots\mu_k} = \det([\delta^{\rho_i}_{\mu_j}]),\]
<p>i.e. the determinant of the matrix whose $(i,j)$-th entry is
$\delta^{\rho_i}_{\mu_j}$. Now when we plug this back in, we get</p>
\[0 = \phi_\beta(e^{\rho_1}\wedge\dots\wedge e^{\rho_k}) = \frac{1}{k!(n-k)!}\delta^{\rho_1\dots\rho_k}_{\mu_1\dots\mu_k}\beta_{\nu_1\dots\nu_{n-k}}\epsilon^{\mu_1\dots\mu_k\nu_1\dots\nu_{n-k}}.\]
<p>Here we are implicitly summing over all the $\mu_i$ indices. From the
definition above, the only terms that survive in the sum are those for
which there exists a permutation $\sigma\in\mathfrak{S}_k$ such that
$\mu_i=\rho_{\sigma(i)}$. Therefore,</p>
\[\begin{aligned}
\phi_\beta(e^{\rho_1}\wedge\dots\wedge e^{\rho_k}) &= \frac{1}{k!(n-k)!}\sum_{\sigma\in \mathfrak{S}_k}\beta_{\nu_1\dots \nu_{n-k}}{\mathrm{sgn}}(\sigma)\epsilon^{\rho_{\sigma(1)}\dots\rho_{\sigma(k)}\nu_1\dots\nu_{n-k}}\\
&=\frac{1}{k!(n-k)!}\sum_{\sigma\in \mathfrak{S}_k}\beta_{\nu_1\dots \nu_{n-k}}\epsilon^{\rho_{1}\dots\rho_{k}\nu_1\dots\nu_{n-k}}\\
&=\frac{1}{(n-k)!}\beta_{\nu_1\dots \nu_{n-k}}\epsilon^{\rho_{1}\dots\rho_{k}\nu_1\dots\nu_{n-k}}
\end{aligned}\]
<p>For each set of indices
${\{\nu_1,\dots,\nu_{n-k}\}}$ if we choose
${\{\rho_1,\dots,\rho_{k}\}}$ complementary to
${\{\nu_1,\dots,\nu_{n-k}\}}$ in
${\{1,\dots,n\}}$, then we have that
$\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}}=\pm 1$ and so
$\beta_{\nu_1\dots\nu_{n-k}}=0$. Therefore $\beta=0$. The map
$\phi:\beta\mapsto \phi_\beta$ is injective, then, and since
$\dim(\Lambda^{n-k}V)=\dim(\Lambda^{k}V)=\dim((\Lambda^{k}V)^*)$, we
obtain that it is an isomorphism.</p>
<p>Recall that we have a metric $g$ on $\Lambda^k V$, which induces an
isomorphism $g_\flat:\Lambda^kV\mapsto (\Lambda^kV)^*$, given as
$g_{\flat}(\alpha):=g(\cdot,\alpha)$. We define then
$\star:\Lambda^kV\to \Lambda^{n-k}V$ as $\star\alpha$ being the unique
element in $\Lambda^{n-k}V$ such that</p>
\[\phi_{\star \alpha}=g_{\flat}(\alpha).\]
<p>If you want to, you could
write $\star=\phi^{-1}\circ g_{\flat}$. At once, this tells us that for
any $\alpha,\beta\in \Lambda^{k}V$,</p>
\[\alpha\wedge\star\beta = \phi_{\star\beta}(\alpha){\mathrm{vol}}= g_\flat(\beta)(\alpha){\mathrm{vol}}= g(\alpha,\beta){\mathrm{vol}}.\]
<p>Okay so the map exists. What about uniqueness? Suppose there is an
isomorphism $\xi:\Lambda^kV\to\Lambda^{n-k}V$ such that
$\alpha\wedge\xi(\beta)=g(\alpha,\beta){\mathrm{vol}}$. This tells us
that $\phi_{\xi(\beta)}(\alpha)=g(\alpha,\beta)$, i.e. that
$\phi_{\xi(\beta)}=g_\flat(\beta)$. But then, by our definition of
$\star$, this precisely means that $\xi(\beta)=\star\beta$.</p>
<p>Now a quick example which will help us down the road: We want to compute
$\star(e^{\rho_1}\wedge\dots\wedge e^{\rho_k})$. We use the fact that
$\star$ is an isomorphism, so we can make an educated guess and just
check it works. Whatever it is, it has to satisfy</p>
\[(e^{\rho_1}\wedge\dots\wedge e^{\rho_k})\wedge \star(e^{\rho_1}\wedge\dots\wedge e^{\rho_k}) = g(e^{\rho_1}\wedge\dots\wedge e^{\rho_k},e^{\rho_1}\wedge\dots\wedge e^{\rho_k}){\mathrm{vol}}= (-1)^s{\mathrm{vol}},\]
<p>where $s$ is, again, the number of negative eigenvalues of the metric.
This means that $\star(e^{\rho_1}\wedge\dots\wedge e^{\rho_k})$ has to
consist of the wedges of the basis elements that we don’t have in
$e^{\rho_1},\dots,e^{\rho_k}$. That is, let
${\{\nu_1,\dots,\nu_{n-k}\}}$ be complementary to
${\{\rho_1,\dots,\rho_k\}}$ in ${\{1,\dots,n\}}$.
Therefore,</p>
\[e^{\rho_1}\wedge\dots\wedge e^{\rho_k}\wedge e^{\nu_1}\wedge\dots\wedge e^{\nu_{n-k}} = \epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}}{\mathrm{vol}}.\]
<p>With this, we then see that</p>
\[\star(e^{\rho_1}\wedge\dots\wedge e^{\rho_k}) = (-1)^s\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}}(e^{\nu_1}\wedge\dots\wedge e^{\nu_{n-k}}) \qquad\text{(no Einstein sum)}.\]
<h1 id="making-it-useful-formulas-in-coordinates">Making it useful: formulas in coordinates</h1>
<p>This is all nice and all but we want to compute the star of a form
explicitly if we have it in terms of some basis. Can do! Let
$e^1,\dots,e^n$ be an orthonormal basis of $V$. By definition, we have
for any $\alpha,\beta\in \Lambda^k V$, that</p>
\[\alpha\wedge\star\beta = g(\alpha,\beta){\mathrm{vol}}.\]
<p>In
components with respect to the orthonormal basis, this is</p>
\[\frac{1}{k!(n-k)!}\alpha_{\mu_1\dots\mu_k}(\star\beta)_{\nu_1\dots\nu_{n-k}}\epsilon^{\mu_1\dots\mu_k\nu_1\dots\nu_{n-k}} = \frac{1}{k!}\alpha^{\mu_1\dots\mu_k}\beta_{\mu_1\dots\mu_k}.\]
<p>Again, we choose $\alpha=e^{\rho_1}\wedge\dots\wedge e^{\rho_k}$, so
that we obtain</p>
\[\frac{1}{k!(n-k)!}\delta^{\rho_1\dots\rho_k}_{\mu_1\dots\mu_k}(\star\beta)_{\nu_1\dots\nu_{n-k}}\epsilon^{\mu_1\dots\mu_k\nu_1\dots\nu_{n-k}} = \frac{1}{k!}g^{\mu_1\lambda_1}\dots g^{\mu_k \lambda_k}\delta_{\lambda_1\dots\lambda_k}^{\rho_1\dots\rho_k}\beta_{\mu_1\dots\mu_k}.\]
<p>On the right-hand side, we have</p>
\[g^{\mu_1\lambda_1}\dots g^{\mu_k \lambda_k}\delta_{\lambda_1\dots\lambda_k}^{\rho_1\dots\rho_k}\beta_{\mu_1\dots\mu_k} = \delta_{\lambda_1\dots\lambda_k}^{\rho_1\dots\rho_k}\beta^{\lambda_1\dots\lambda_k}.\]
<p>But now recall that
$\delta_{\lambda_1\dots\lambda_k}^{\rho_1\dots\rho_k}$ is non-zero only
when there is a permutation $\sigma$ such that
$\lambda_i=\rho_{\sigma(i)}$. Then we have</p>
\[\delta_{\lambda_1\dots\lambda_k}^{\rho_1\dots\rho_k}\beta^{\lambda_1\dots\lambda_k} = \sum_{\sigma\in \mathfrak{S}_k}\delta_{\rho_{\sigma(1)}\dots\rho_{\sigma(k)}}^{\rho_1\dots\rho_k}\beta^{\rho_{\sigma(1)}\dots\rho_{\sigma(k)}} = \sum_{\sigma\in \mathfrak{S}_k}{\mathrm{sgn}}(\sigma)\beta^{\rho_{\sigma(1)}\dots\rho_{\sigma(k)}}= \sum_{\sigma\in \mathfrak{S}_k}{\mathrm{sgn}}(\sigma)^2\beta^{\rho_{1}\dots\rho_{k}}=k!\beta^{\rho_1\dots\rho_k}.\]
<p>On the left-hand side, we have something similar:</p>
\[\delta^{\rho_1\dots\rho_k}_{\mu_1\dots\mu_k}\epsilon^{\mu_1\dots\mu_k\nu_1\dots\nu_{n-k}} = k!\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}}.\]
<p>When we put it all together, we get</p>
\[\frac{1}{(n-k)!}(\star\beta)_{\nu_1\dots\nu_{n-k}}\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}}=\beta^{\rho_1\dots\rho_{k}}.\]
<p>We are nearly done! We only need to get rid of that Levi-Civita symbol
on the left-hand side. To do so, consider the sum<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
\[\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}}\epsilon_{\rho_1\dots\rho_k\lambda_1\dots\lambda_{n-k}}.\]
<p>The only terms of the sum that are non-zero are when
${\{\rho_1,\dots,\rho_k\}}$ is complementary to <em>both</em> the
sets ${\{\nu_1,\dots,\nu_{n-k}\}}$ and
${\{\lambda_1,\dots,\lambda_{n-k}\}}$ in
${\{1,\dots,n\}}$. This implies that the sum is non-zero only
if
${\{\lambda_1,\dots,\lambda_{n-k}\}}={\{\nu_1,\dots,\nu_{n-k}\}}$.
Then suppose that there is some $\sigma\in\mathfrak{S}<em>{n-k}$ such
that $\lambda_i=\nu</em>{\sigma(i)}$. Without the Einstein convention, this
becomes:</p>
\[\begin{aligned}
\sum_{\rho_1,\dots,\rho_k}\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}}\epsilon_{\rho_1\dots\rho_k\nu_{\sigma(1)}\dots\nu_{\sigma(n-k)}} &= \sum_{\rho_1,\dots,\rho_k}{\mathrm{sgn}}(\sigma)\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}}\epsilon_{\rho_1\dots\rho_k\nu_{1}\dots\nu_{n-k}}\\
&= \sum_{\rho_1,\dots,\rho_k}{\mathrm{sgn}}(\sigma)(\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}})^2\\
&=\sum_{\rho_1,\dots,\rho_k}{\mathrm{sgn}}(\sigma)(\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}})^2\\
&=k!{\mathrm{sgn}}(\sigma).
\end{aligned}\]
<p>In conclusion, we have that</p>
\[\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}}\epsilon_{\rho_1\dots\rho_k\lambda_1\dots\lambda_{n-k}} = k!\delta^{\nu_1\dots\nu_{n-k}}_{\lambda_1\dots\lambda_{n-k}}.\]
<p>With this, finally we obtain</p>
\[\frac{1}{(n-k)!}(\star\beta)_{\nu_1\dots\nu_{n-k}}\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}}\epsilon_{\rho_1\dots\rho_k\lambda_1\dots\lambda_{n-k}} = \frac{k!}{(n-k)!}(\star\beta)_{\nu_1\dots\nu_{n-k}}\delta^{\nu_1\dots\nu_{n-k}}_{\lambda_1\dots\lambda_{n-k}} = k!(\star\beta)_{\lambda_1\dots\lambda_{n-k}}.\]
<p>Now we put it all together:</p>
\[k!(\star\beta)_{\lambda_1\dots\lambda_{n-k}} =\beta^{\rho_1\dots\rho_k}\epsilon_{\rho_1\dots\rho_k\lambda_1\dots\lambda_{n-k}},\]
<p>i.e.</p>
\[(\star\beta)_{\lambda_1\dots\lambda_{n-k}}=\frac{1}{k!}\beta^{\rho_1\dots\rho_k}\epsilon_{\rho_1\dots\rho_k\lambda_1\dots\lambda_{n-k}}.\]
<p>And we’re done! Right? We’re done, right? … guys? What’s wrong?</p>
<p>…</p>
<p>What do you mean you want it for a <em>general</em> basis?</p>
<p>Okay let $u^1,\dots,u^n$ be some basis, and let $A$ be the
change-of-basis matrix from the $e$ to the $u$ basis, such that</p>
\[u^i = A^{i}_{\phantom{i}j}e^j.\]
<p>In this new basis, the volume form is
<em>not</em> $u^1\wedge\dots\wedge u^n$, but these two are non-zero top-forms
so they’re only scalar multiples of one another. Explicitly,</p>
\[u^1\wedge\dots\wedge u^n = A^1_{\phantom{1}\mu_1}\dots A^n_{\phantom{n}\mu_n}e^{\mu_1}\wedge\dots\wedge e^{\mu_n} = A^1_{\phantom{1}\mu_1}\dots A^n_{\phantom{n}\mu_n}\epsilon^{\mu_1\dots\mu_n}e^{1}\wedge\dots\wedge e^{n} = \det(A){\mathrm{vol}}.\]
<p>What is $\det(A)$? Fortunately we can calculate it easily: Let $g_{u}$
be the matrix representation of $g$ on the $u$-basis, namely</p>
\[[g_u]^{\mu\nu}=g(u^\mu,u^{\nu}).\]
<p>Then</p>
\[[g_u]^{\mu\nu}=g(u^\mu,u^{\nu}) = A^{\mu}_{\phantom{\mu}\rho}A^{\nu}_{\phantom{\nu}\lambda}g(e^{\rho},e^\lambda) = A^{\mu}_{\phantom{\mu}\rho}[g_e]^{\rho\lambda}A^{\nu}_{\phantom{\nu}\lambda}=[A\cdot g_e\cdot A^T]^{\mu\nu},\]
<p>where $g_e$ is the matrix representation of $g$ with respect to the
orthonormal basis, i.e. $g_e$ is diagonal with $\pm 1$ on the diagonal.
Taking the determinant we get</p>
\[\det(g_u)= \det(A)^2\det(g_e)=(-1)^s\det(A)^2,\]
<p>with $s$ being the
number of negative eigenvalues of $g_e$ (also known as the signature of
the metric). Thus,</p>
\[\det(A) = \pm\sqrt{|\det(g_u)|},\]
<p>where the sign
depends on the orientation of the $u$ basis. Thus,</p>
\[u^1\wedge\dots\wedge u^n = \pm \sqrt{|\det(g_u)|}{\mathrm{vol}}.\]
<p>With this we can find the expression for $\star$ with respect to any
choice of basis. In the $u$ basis, we have</p>
\[\begin{aligned}
\alpha\wedge\star\beta&=\frac{1}{k!(n-k)!}\alpha_{\mu_1\dots\mu_k}(\star\beta)_{\nu_1\dots\nu_{n-k}}\epsilon^{\mu_1\dots\mu_k\nu_1\dots\nu_{n-k}}u^1\wedge\dots\wedge u^n\\
&=\frac{\pm\sqrt{|\det{g_u}|}}{k!(n-k)!}\alpha_{\mu_1\dots\mu_k}(\star\beta)_{\nu_1\dots\nu_{n-k}}\epsilon^{\mu_1\dots\mu_k\nu_1\dots\nu_{n-k}}{\mathrm{vol}}.
\end{aligned}\]
<p>And thus we can repeat the same process as we did
above, just that we need to carry the $\pm\sqrt{|\det(g_u)|}$ on the
left-hand side for the whole ride. In the end, we get</p>
\[(\star\beta)_{\lambda_1\dots\lambda_{n-k}}=\pm\frac{1}{k!}\frac{1}{\sqrt{|\det(g_u)|}}\beta^{\rho_1\dots\rho_k}\epsilon_{\rho_1\dots\rho_k\lambda_1\dots\lambda_{n-k}}.\]
<p>Alright before you say anything, yes, <em>I know</em> that this is not the same
equation that you’ll see basically everywhere else; the determinant of
the metric should in the numerator, you say? Yes, but actually no. See,
here we worked with the exterior algebra of a vector space $V$, not its
dual. In practice, with differential forms, we’re working with the
exterior algebra of differential forms, which are dual to the tangent
spaces. That changes the formula a little bit since the matrix of the
metric on the dual is the <em>inverse</em> of the matrix in the tangent space.</p>
<h1 id="on-differential-forms">On differential forms</h1>
<p>Before jumping head-first to differential forms, let’s see what happens
when we try to apply all this on the dual. If we have a metric $g$ on
$V$, it induces isomorphisms $g_{\flat}:V\to V^*$ and
$g^{\sharp}=(g_{\flat})^{-1}:V^*\to V$, given as</p>
\[g_\flat(v)(u)=g(v,u)\]
<p>for all $u, v\in V$, and similarly, for all $\alpha\in V^*$, we have
that $g^\sharp(\alpha)\in V$ is such that</p>
\[g(g^\sharp(\alpha),u)=\alpha(u)\]
<p>for all $u\in V$. If we take a basis
<strong>(INDEX SWITCH ALERT)</strong> $u_1,\dots,u_k\in V$, and write
$v=v^\mu u_\mu$, then what are the components of $g_\flat(v)$? If
$u^1,\dots,u^k$ is the dual basis of $V^*$, then we can write</p>
\[g_\flat(v)=(g_\flat(v))_\mu u^\mu,\]
<p>where</p>
\[g_\flat(v)_\mu = g_\flat(v)(u_\mu)=g(v,u_\mu)=v^\nu g(u_\nu,u_\mu):=v^\nu g_{\nu\mu}.\]
<p>Here we write $g_{\mu\nu}=g(u_\mu,u_\nu)$ as the components of the
metric in this basis. We call this lowering the index of $v$, and we
simply abuse notation by writing $g_\flat(v)=v_\mu u^\mu$, with
$v_\mu:=v^\nu g_{\nu\mu}$.</p>
<p>Similarly, if $\alpha=\alpha_\mu u^\mu\in V^*$, then we have that</p>
\[\alpha_\nu=\alpha(e_\nu)=g(g^\sharp(\alpha),e_\nu)=(g^\sharp(\alpha))^\mu g(u_{\mu},u_\nu)=(g^\sharp(\alpha))^\mu g_{\mu\nu}.\]
<p>Thus we can invert this matrix equation and write</p>
\[(g^\sharp(\alpha))^\mu := (g^{-1})^{\mu\nu}\alpha_\nu,\]
<p>where
$(g^{-1})^{\mu\nu}$ are the components<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> of the <em>inverse</em> of the
matrix of $g$.</p>
<p>So far so good. Now we can simply pull back the metric $g$ from $V$ to
$V^*$ using $g^\sharp$, and define (using the same symbol), for any
$\alpha,\beta\in V^*$,</p>
\[g(\alpha,\beta)=g(g^\sharp(\alpha),g^{\sharp}(\beta)).\]
<p>What are the
components of the dual metric with respect to the $u^1,\dots,u^n$ basis?
Well we compute</p>
\[\begin{aligned}
g^{\mu\nu}=g(e^\mu,e^\nu)=g(g^\sharp(e^\mu),g^\sharp(e^{\nu}))&=(g^{-1})^{\alpha\beta}(g^{-1})^{\rho\sigma}g((e^\mu)_{\beta}e_\alpha,(e^\nu)_\sigma e_\rho)\\
&= (g^{-1})^{\alpha\mu}(g^{-1})^{\rho\nu}g(e_\alpha,e_\rho)\\
&=(g^{-1})^{\alpha\mu}(g^{-1})^{\rho\nu}g_{\alpha\rho}\\
&= (g^{-1})^{\mu\nu}.\end{aligned}\]
<p>Thus the components of the metric on $V^*$ are the components of the
inverse of the metric on $V$. We drop the clunky $^{-1}$ from now on
since there is no ambiguity: $g^{\mu\nu}$ always means the components of
the <em>inverse</em> of the matrix with entries $g_{\mu\nu}$.</p>
<p>Now we’re done! Let $M$ be a smooth manifold with a Lorentzian metric
$g$. We will apply all this, pointwise, to the cotangent spaces of $M$.
By definition, $g$ is a smooth tensor field on $M$ which is point-wise a
metric $g_x:T_xM\times T_xM\to {\mathbb{R}}$. This metric induces a
metric $g^{\text{dual}}$ on $T_x^*M$ via the isomorphism
$(g_x)^\sharp:T_x^*M\to T_xM$ as above. Now we define the Hodge-dual
<em>pointwise</em> but on the <em>cotangent space</em>, so the metric we use is the
<em>inverse</em> of the metric on $TM$. That is, in the notation of section
<a href="#defining-the-hodge-star">Defining the Hodge star</a> we let $V=T_x^*M$, so that the Hodge star
is $\star:\Omega_x^k(M)\to\Omega_x^{n-k}(M)$, but in this case the
components are</p>
\[(\star\beta)_{\lambda_1\dots\lambda_{n-k}}=\pm\frac{1}{k!}\frac{1}{\sqrt{|\det(g^\text{dual}_u)|}}\beta^{\rho_1\dots\rho_k}\epsilon_{\rho_1\dots\rho_k\lambda_1\dots\lambda_{n-k}} = \pm\frac{\sqrt{|\det(g_u)|}}{k!}\beta^{\rho_1\dots\rho_k}\epsilon_{\rho_1\dots\rho_k\lambda_1\dots\lambda_{n-k}}.\]
<p>If you look at this expression, it is obviously smooth since the
components of $g$ and $\beta$ are and $\det(g_u)$ is non-zero. Thus we
can happily extend $\star$ to be a global operator</p>
\[\star:\Omega^k(M)\to \Omega^{n-k}(M).\]
<h2 id="a-neat-example-de-rham-vs-curl-grad-div">A neat example: de Rham vs. curl, grad, div</h2>
<p>Now let’s make it explicit. We consider $M={\mathbb{R}}^3$, with its
natural euclidean metric $g$, and coordinates $x,y,z$. The volume form
is simply</p>
\[{\mathrm{vol}}= {\mathrm{d}}{x}\wedge{\mathrm{d}}{y}\wedge{\mathrm{d}}{z}.\]
<p>Now let’s see what $\star$ does to $0$, $1$, $2$, and $3$-forms. First,
recall that if $e^1,\dots,e^n$ is an orthonormal basis, then</p>
\[\star(e^{\rho_1}\wedge\dots\wedge e^{\rho_k}) = (-1)^s\epsilon^{\rho_1\dots\rho_k\nu_1\dots\nu_{n-k}}(e^{\nu_1}\wedge\dots\wedge e^{\nu_{n-k}}) \qquad\text{(no Einstein sum)},\]
<p>where $s$ is the number of elements in $\{e^{\rho_1},\dots,e^{\rho_k}\}$ with
negative length, and
${\{\nu_1,\dots,\nu_{n-k}\}}$ is complementary to
${\{\rho_1,\dots,\rho_k\}}$ in ${\{1,\dots,n\}}$.
Now a $0$-form is just a smooth function, say $f$, and we simply have</p>
\[\star f =f{\mathrm{vol}}= f(x,y,z){\mathrm{d}}{x}\wedge{\mathrm{d}}{y}\wedge{\mathrm{d}}{z}.\]
<p>Now for one-forms, the above result tells us that</p>
\[\begin{aligned}
\star({\mathrm{d}}{x}) &= {\mathrm{d}}{y}\wedge{\mathrm{d}}{z}\\
\star({\mathrm{d}}{y}) &= {\mathrm{d}}{z}\wedge{\mathrm{d}}{x}\\
\star({\mathrm{d}}{z}) &= {\mathrm{d}}{x}\wedge{\mathrm{d}}{y}, \end{aligned}\]
<p>so that</p>
\[\star(\omega_x{\mathrm{d}}{x}+\omega_y{\mathrm{d}}{y}+\omega_z{\mathrm{d}}{z})=\omega_x{\mathrm{d}}{y}\wedge{\mathrm{d}}{z} + \omega_y{\mathrm{d}}{z}\wedge{\mathrm{d}}{x}+\omega_z{\mathrm{d}}{x}\wedge{\mathrm{d}}{y}.\]
<p>Similarly, for $2$-forms we have</p>
\[\star(\omega_{yz}{\mathrm{d}}{y}\wedge{\mathrm{d}}{z} + \omega_{zx}{\mathrm{d}}{z}\wedge{\mathrm{d}}{x}+\omega_{xy}{\mathrm{d}}{x}\wedge{\mathrm{d}}{y})=\omega_{yz}{\mathrm{d}}{x}+\omega_{zx}{\mathrm{d}}{y}+\omega_{xy}{\mathrm{d}}{z}.\]
<p>For $3$-forms,
\(\star(f{\mathrm{vol}}) = f.\)</p>
<p>Now let’s talk about grad, curl, and div. Just as a reminder and for the
sake of completeness, let’s write them down. Let $U\subseteq M$ be an
open set. Then we define
${\mathrm{grad}}:C^{\infty}(U)\to \mathfrak{X}(U)$ as</p>
\[{\mathrm{grad}}(f) = {\frac{\partial f}{\partial x}}{\frac{\partial }{\partial x}} + {\frac{\partial f}{\partial y}}{\frac{\partial }{\partial y}} + {\frac{\partial f}{\partial z}}{\frac{\partial }{\partial z}} = g^\sharp({\mathrm{d}}{f}).\]
<p>We define ${\mathrm{curl}}:\mathfrak{X}(U)\to \mathfrak{X}(U)$,
defined as</p>
\[{\mathrm{curl}}\left(f^x{\frac{\partial }{\partial x}} + f^y{\frac{\partial }{\partial y}} + f^z{\frac{\partial }{\partial z}}\right) = \left({\frac{\partial f^z}{\partial y}} - {\frac{\partial f^y}{\partial z}}\right){\frac{\partial }{\partial x}} + \left({\frac{\partial f^x}{\partial z}} - {\frac{\partial f^z}{\partial x}}\right){\frac{\partial }{\partial y}} + \left({\frac{\partial f^y}{\partial x}} - {\frac{\partial f^y}{\partial x}}\right){\frac{\partial }{\partial z}},\]
<p>and finally, ${\mathrm{div}}:\mathfrak{X}(U)\to C^{\infty}(U)$ given
as</p>
\[{\mathrm{div}}\left(f^x{\frac{\partial }{\partial x}} + f^y{\frac{\partial }{\partial y}} + f^z{\frac{\partial }{\partial z}}\right) = {\frac{\partial f^x}{\partial x}} + {\frac{\partial f^y}{\partial y}} + {\frac{\partial f^z}{\partial z}}.\]
<p>It is an elementary exercise to prove that
${\mathrm{curl}}\circ {\mathrm{grad}}= 0$ and
${\mathrm{div}}\circ {\mathrm{curl}}= 0$, so that we have a complex</p>
\[0 \to C^\infty(U)\overset{\mathrm{grad}}{\to}\mathfrak{X}(U)\overset{\mathrm{curl}}{\to}\mathfrak{X}(U)\overset{\mathrm{div}}{\to} C^\infty(U)\to 0,\]
<p>called the <strong>gcd complex</strong>.</p>
<p>Now let’s compare this to the de Rham differential, explicitly to be
more clear: for a $0$-form,</p>
\[{\mathrm{d}}f = {\frac{\partial f}{\partial x}}{\mathrm{d}}{x} + {\frac{\partial f}{\partial y}}{\mathrm{d}}{y} + {\frac{\partial f}{\partial z}}{\mathrm{d}}{z}.\]
<p>For a $1$-form,</p>
\[{\mathrm{d}}(\omega_x{\mathrm{d}}{x}+\omega_y{\mathrm{d}}{y}+\omega_z{\mathrm{d}}{z})=\left({\frac{\partial \omega_z}{\partial y}}-{\frac{\partial \omega_y}{\partial z}}\right){\mathrm{d}}{y}\wedge{\mathrm{d}}{z} + \left({\frac{\partial \omega_x}{\partial z}}-{\frac{\partial \omega_z}{\partial x}}\right){\mathrm{d}}{z}\wedge{\mathrm{d}}{x} + \left({\frac{\partial \omega_y}{\partial x}}-{\frac{\partial \omega_x}{\partial y}}\right){\mathrm{d}}{x}\wedge{\mathrm{d}}{y}.\]
<p>And for a $2$-form,</p>
\[{\mathrm{d}}(\omega_{yz}{\mathrm{d}}{y}\wedge{\mathrm{d}}{z} + \omega_{zx}{\mathrm{d}}{z}\wedge{\mathrm{d}}{x}+\omega_{xy}{\mathrm{d}}{x}\wedge{\mathrm{d}}{y}) = \left({\frac{\partial \omega_{yz}}{\partial x}} + {\frac{\partial \omega_{zx}}{\partial y}} +{\frac{\partial \omega_{xy}}{\partial z}}\right){\mathrm{d}}{x}\wedge{\mathrm{d}}{y}\wedge{\mathrm{d}}{z}.\]
<p>This tells us that there is an isomorphism between the gcd complex and
the de Rham complex, given as $\phi_0:C^\infty(U)\to C^\infty(U)$ being
simply $\phi_0={\mathrm{id}}$. We also have
$\phi_1:\mathfrak{X}(U)\to \Omega^1(U)$ as</p>
\[\phi_1\left(f^x{\frac{\partial }{\partial x}} + f^y{\frac{\partial }{\partial y}} + f^z{\frac{\partial }{\partial z}}\right) = f^x{\mathrm{d}}{x} + f^y{\mathrm{d}}{y} + f^z{\mathrm{d}}{z},\]
<p>i.e. $\phi_1=g_\flat$. We also have
$\phi_2:\mathfrak{X}(U)\to\Omega^2(U)$ given as</p>
\[\phi_2\left(f^x{\frac{\partial }{\partial x}} + f^y{\frac{\partial }{\partial y}} + f^z{\frac{\partial }{\partial z}}\right) = f^x{\mathrm{d}}{y}\wedge{\mathrm{d}}{z} + f^y{\mathrm{d}}{z}\wedge{\mathrm{d}}{x}+f^z{\mathrm{d}}{x}\wedge{\mathrm{d}}{y},\]
<p>which we can identify as $\phi_2 = \star\circ g_\flat$. Finally, we have
$\phi_3:C^{\infty}(U)\to\Omega^3(U)$ given as</p>
\[\phi_3(f)=f{\mathrm{d}}{x}\wedge{\mathrm{d}}{y}\wedge{\mathrm{d}}{z}=\star(f).\]
<p>By construction, we have $\phi_0=\mathrm{id}_{C^{\infty}(U)}$,
$\phi_1=g_\flat$, $\phi_2=\star\circ g_{\flat}$ and $\phi_3=\star$, all
of which are isomorphisms. Now we need to check that the diagram</p>
<p><img src="/assets/posts/hodge-star/gcd-complex.png" alt="" /></p>
<p>commutes… but this is a straightforward, albeit a bit boring,
computation. Therefore we have that $\phi_\bullet$ is an isomorphism of
complexes, which induces an isomorphism in cohomology:</p>
\[\begin{aligned}
\ker({\mathrm{grad}})&\cong H^0(U)\\
\ker({\mathrm{curl}})/{\mathrm{im}}({\mathrm{grad}})&\cong H^1(U)\\
\ker({\mathrm{div}})/{\mathrm{im}}({\mathrm{curl}})&\cong H^2(U)\\
C^\infty(U)/{\mathrm{im}}({\mathrm{div}}) &\cong H^3(U).\end{aligned}\]
<p>So suppose you have a vector field $\mathbf{E}\in \mathfrak{X}(U)$
satisfying ${\mathrm{curl}}(\mathbf{E})=0$. When can you guarantee
that $\mathbf{E}={\mathrm{grad}}({\varphi})$ for some scalar function
${\varphi}\in C^\infty(U)$? The previous result tells us that when
$H^1(U)=0$, i.e. when $U$ is simply connected, then every irrotational
field is a gradient. Similarly, if you have a field $\mathbf{B}$ such
that ${\mathrm{div}}(\mathbf{B})=0$, then if $H^2(U)=0$, we can
guarantee that $\mathbf{B}={\mathrm{curl}}(\mathbf{A})$ for some
field $\mathbf{A}\in \mathfrak{X}(U)$.</p>
<h2 id="another-neat-example-wedge-and-cross-product">Another neat example: wedge and cross product</h2>
<p>Have you noticed that the cross product in ${\mathbb{R}}^3$ behaves very
similarly to the wedge product? With the antisymmetry and all. There is,
of course, a huge difference between both: the cross product returns
another vector in ${\mathbb{R}}^3$, i.e.
$\times:{\mathbb{R}}^3\times{\mathbb{R}}^3\to {\mathbb{R}}^3$, whereas
the wedge product returns an element of the exterior product of
${\mathbb{R}}^3$ with itself,
$\wedge:{\mathbb{R}}^3\times {\mathbb{R}}^3\to {\mathbb{R}}^3\wedge{\mathbb{R}}^3$.
How can we bridge both?</p>
<p>On ${\mathbb{R}}^3$ we have the canonical euclidean metric $g$, and thus
we have the Hodge star
$\star:{\mathbb{R}}^3\wedge{\mathbb{R}}^3\to {\mathbb{R}}^3$. With this
we can construct an antisymmetric map
$\star\circ\wedge:{\mathbb{R}}^3\times{\mathbb{R}}^3\to {\mathbb{R}}^3$.</p>
<p>Let $e_1,e_2,e_3$ be the canonical orthonormal basis of
${\mathbb{R}}^3$. We then see that</p>
\[\begin{aligned}
\star(e_1\wedge e_2)&=e_3\\
\star(e_2\wedge e_3)&=e_1\\
\star(e_3\wedge e_1)&=e_2.\end{aligned}\]
<p>However, this is the same as</p>
\[\begin{aligned}
e_1\times e_2&=e_3\\
e_2\times e_3&=e_1\\
e_3\times e_1&=e_2.\end{aligned}\]
<p>We then happily conclude that</p>
\[u\times v = \star(u\wedge v)\]
<p>for all $u,v\in {\mathbb{R}}^3$.</p>
<h1 id="the-takeaway">The takeaway</h1>
<p>The Hodge star operator makes an explicit isomorphism between the
exterior powers $\Lambda^kV$ and $\Lambda^{n-k}V$ of a vector space with
the aid of a metric. With it we can bridge the similarities between
exterior products of complementary degrees. It also makes explicit the
relationship between the cross product and vector calculus in
${\mathbb{R}}^3$ and the calculus and algebra of differential forms on
it. As a quick corollary, we saw that the existence of <em>potentials</em> for
certain functions depends on the topology of the underlying space (which
in most cases in physics is trivial).</p>
<p>With the Hodge star we will be able to neatly write Maxwell’s equations,
and more importantly, generalize them for a large class of physical
fields: gauge fields.</p>
<h1 id="references">References</h1>
<ul>
<li>
<p>Báez, J. C. and Muniain, J. P. (1994). <em>Gauge Fields, Knots, and
Gravity</em>, World Scientific. Section I.5. Honestly I don’t even know
why you’d read this post, go and read Báez instead.</p>
</li>
<li>
<p>Quintero Vélez, A. (2018). <em>Notas de Fundamentos Matemáticos de las
Teorías de Campos Gauge</em>.</p>
</li>
<li>
<p>Fecko, M. (2006). <em>Differential Geometry and Lie Groups for
Physicists</em>. Cambridge University Press. This is a great book with
great humor. Chapter 5 is a good introduction to differential forms.
It is a problem-driven book.</p>
</li>
<li>
<p>Conrad, B. <em>Notes for Math 396: Tensor Algebras, Tensor Pairings,
and
Duality.</em><a href="http://virtualmath1.stanford.edu/~conrad/diffgeomPage/handouts/tensor.pdf">http://virtualmath1.stanford.edu/~conrad/diffgeomPage/handouts/tensor.pdf</a></p>
</li>
</ul>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Here, the Levi-Civita symbol with lowered indices is <em>not</em> lowered
with the metric, i.e. we consider (but <em>only</em> for the Levi-Civita
symbol!)
\(\epsilon^{\mu_1\dots\mu_k}=\epsilon_{\mu_1\dots\mu_k}.\)
This means that we are not allowed to raise or lower the indices of
$\epsilon$ with the metric. It is <em>just</em> a convenient symbol for
adding things that does not represent the components of a tensor! <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>yes… I know that we don’t carry around the $^{-1}$. Just gimme a
minute okay? <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Santiago Quintero de los RíosHave you ever had a k-form but wanted a (n-k)-form? Do you have a metric? Well you're in luck! Here we tell you how the Hodge star operator lets you pass from one to the other.Multiparticle states in quantum mechanics, part 2: Constructing the Fock space2019-03-19T00:00:00+01:002019-03-19T00:00:00+01:00http://homotopico.com/2019/03/19/fock-2-state<p>Get the pdf version of this post <a href="/assets/docs/pdf_posts/fock-2-state.pdf">here</a>.</p>
<p>This time we’re going to be more concrete. The general theory of
many-particle states can be quite tricky and unintuitive, so it’s better
to start everything with the simplest example: a 2-state system.</p>
<p>Last time in Homotopico, we derived the Hilbert spaces that correspond
to multiple identical particles. If a single particle is represented by
the Hilbert space $ \mathcal{H}$, then there are two kinds of
indistiguishable $k$-particle spaces: the bosonic space
$ \mathcal{S}^k \mathcal{H}$ and the fermionic space
$\Lambda^k \mathcal{H}$. These spaces are defined in terms of the
permutation operators $P_\sigma$ (in the previous post I called them
$T_\sigma$), which are the natural representation of the permutation
group $ \mathfrak{S}_k$ on $\otimes^k \mathcal{H}$: for each
$\sigma\in \mathfrak{S}_k$, define</p>
\[P_\sigma{\vert \psi_1\dots\psi_k \rangle}={\vert \psi_{\sigma(1)}\dots\psi_{\sigma(k)} \rangle}.\]
<p>The bosonic space is composed of vectors that are <em>permutation
invariant</em>,</p>
\[\mathcal{S}^k \mathcal{H} = {\left\{ {\vert \Psi \rangle} \in \otimes^k \mathcal{H}~:~P_\sigma{\vert \Psi \rangle}={\vert \Psi \rangle}\quad \forall\sigma\in \mathfrak{S}_k\right\}},\]
<p>while the fermionic space is composed of vectors that are <em>reversed</em>
under odd permutations, i.e.</p>
\[\Lambda^k \mathcal{H} = {\left\{ {\vert \Psi \rangle} \in \otimes^k \mathcal{H}~:~P_\sigma{\vert \Psi \rangle}=\operatorname{sgn}(\sigma){\vert \Psi \rangle}\quad \forall\sigma\in \mathfrak{S}_k\right\}}.\]
<p>It can be checked that these are indeed Hilbert spaces (we won’t do that
here). What we will do now is constructing state spaces with an
<em>arbitrary number of particles</em>, instead of a <em>fixed</em> number of
particles. Here, we will first focus on a 2-state system, for example
polarization of photons or spin-$\frac{1}{2}$.</p>
<h1 id="sec:2-states-1">Two states one particle</h1>
<p>When we talk about a system with $2$ states, we <em>really</em> mean that the
Hilbert space $ \mathcal{H}$ is of (complex) dimension $2$. This tells
us that we can find a basis
$\left\{ {\vert a \rangle},{\vert b \rangle}\right\}$, such that
${\left\langle a \middle\vert a\right\rangle}={\left\langle b \middle\vert b\right\rangle}=1$
and ${\left\langle a \middle\vert b\right\rangle}=0$. In the case of
spin-$\frac{1}{2}$ systems, ${\vert a \rangle},{\vert b \rangle}$ are
normalized eigenstates of some spin operator, often $\hat{S}_z$. In the
case of photon polarization, ${\vert a \rangle},{\vert b \rangle}$
represent states of pure horizontal or vertical polarization with
respect to some axis. What the kets ${\vert a \rangle}$ and
${\vert b \rangle}$ actually stand for (physically speaking) is
irrelevant for our discussion. What is important is that we have an
orthonormal basis that consists of $2$ vectors, which means that any
(one-particle) state in $ \mathcal{H}$ can be written as</p>
\[{\vert \psi \rangle} = \alpha{\vert a \rangle} + \beta{\vert b \rangle},\]
<p>with $\alpha,\beta\in {\mathbb{C}}$.</p>
<h1 id="sec:2-particles">2-particle states</h1>
<p>We now stick two of these one-particle spaces together via the tensor
product. Recall that $ \mathcal{H}\otimes \mathcal{H}$ is the vector
space of <em>linear combinations</em> of elements of the form
${\vert \psi_1\psi_2 \rangle}:={\vert \psi_1 \rangle}\otimes{\vert \psi_2 \rangle}$
with ${\vert \psi_1 \rangle},{\vert \psi_2 \rangle}\in \mathcal{H}$.
Since we have a basis ${\vert a \rangle},{\vert b \rangle}$ of
$ \mathcal{H}$, we can write</p>
\[\begin{aligned}
{\vert \psi_1 \rangle} &= \alpha_1{\vert a \rangle} + \beta_1{\vert b \rangle}\\
{\vert \psi_2 \rangle} &= \alpha_2{\vert a \rangle} + \beta_2{\vert b \rangle},
\end{aligned}\]
<p>so that</p>
\[{\vert \psi_1\psi_2 \rangle} = (\alpha_1{\vert a \rangle} + \beta_1{\vert b \rangle})\otimes(\alpha_2{\vert a \rangle} + \beta_2{\vert b \rangle}) = \alpha_1\alpha_2{\vert aa \rangle} + \alpha_1\beta_2{\vert ab \rangle} + \beta_1\alpha_2{\vert ba \rangle} + \beta_1\beta_2{\vert bb \rangle}.\]
<p>This means that, in general, <em>any</em> element of
$ \mathcal{H}\otimes \mathcal{H}$ can be written as a linear
combination of
${\vert aa \rangle},{\vert ab \rangle},{\vert ba \rangle},{\vert bb \rangle}$.
However, not every element of $ \mathcal{H}\otimes \mathcal{H}$ can be
written as ${\vert \psi_1\psi_2 \rangle}$ for some
${\vert \psi_1 \rangle},{\vert \psi_2 \rangle}\in \mathcal{H}$. The
elements that can be written in this way are called <strong>product states</strong>.</p>
<p>If we define the inner product in $ \mathcal{H}$ as</p>
\[{\left\langle \psi_1\otimes \psi_2 \middle\vert \phi_1\otimes\phi_2\right\rangle}:={\left\langle \psi_1 \middle\vert \phi_1\right\rangle}{\left\langle \psi_2 \middle\vert \phi_2\right\rangle},\]
<p>it can be easily shown that
$\left\{ {\vert aa \rangle},{\vert ab \rangle},{\vert ba \rangle},{\vert bb \rangle} \right\}$
forms an orthonormal basis of $ \mathcal{H}\otimes \mathcal{H}$.</p>
<p>We interpret the state ${\vert \psi_1\psi_2 \rangle}$ as saying
“particle 1 is in state ${\vert \psi_1 \rangle}$ and particle 2 is in
state ${\vert \psi_2 \rangle}$”. Since in general
${\vert \psi_1\psi_2 \rangle}\neq{\vert \psi_2\psi_1 \rangle}$, under
this interpretation our particles are <em>distinguishable</em>. Indeed, that’s
why it even makes sense to speak of particle 1 and particle 2.</p>
<h2 id="sec:intr-indist">Introducing indistinguishability</h2>
<p>In the previous post, we explained that the spaces of indistinguishable
particles are those in which the permutation transformation
$P: \mathcal{H}\otimes \mathcal{H}\to \mathcal{H}\otimes \mathcal{H}$,
given by</p>
<p>\(P{\vert \psi_1\psi_2 \rangle} := {\vert \psi_2\psi_1 \rangle},\)
and
extended everywhere by linearity, <em>is an absolute symmetry</em>.
Equivalently, a state
${\vert \Psi \rangle}\in \mathcal{H}\otimes \mathcal{H}$ (not
necessarily a product state!) represents indistinguishable particles if
and only if
\(P{\vert \Psi \rangle} = \pm{\vert \Psi \rangle}.\)
The
<em>bosonic</em> states are the ones for which
$P{\vert \Psi \rangle}={\vert \Psi \rangle}$, and the <em>fermionic</em> states
are the ones for which $P{\vert \Psi \rangle} = -{\vert \Psi \rangle}$.
In the previous post we showed that given a product state
${\vert \Psi \rangle}= {\vert \psi_1\psi_2 \rangle}$, the state</p>
\[\frac{1}{2}\left({\vert \psi_1\psi_2 \rangle}+{\vert \psi_2\psi_1 \rangle}\right)\]
<p>is a bosonic state, while</p>
\[\frac{1}{2}\left({\vert \psi_1\psi_2 \rangle}-{\vert \psi_2\psi_1 \rangle}\right)\]
<p>is a fermionic state, and neither is necessarily normalized. Indeed,
<em>all</em> the bosonic and fermionic 2-particle states are expressed as
linear combinations of elements of that form! To see this, consider an
arbitrary state ${\vert \Psi \rangle}$. We can write it in terms of the
basis as</p>
\[{\vert \Psi \rangle} = \alpha_{11}{\vert aa \rangle} + \alpha_{12}{\vert ab \rangle} + \alpha_{21}{\vert ba \rangle} + \alpha_{22}{\vert bb \rangle},\]
<p>where $\alpha_{11},\dots,\alpha_{22}\in {\mathbb{C}}$ are complex
numbers. Now we apply $P$:</p>
\[\begin{aligned}
P{\vert \Psi \rangle} &= \alpha_{11}P{\vert aa \rangle} + \alpha_{12}P{\vert ab \rangle} + \alpha_{21}P{\vert ba \rangle} + \alpha_{22}P{\vert bb \rangle}\\
&= \alpha_{11}{\vert aa \rangle} + \alpha_{12}{\vert ba \rangle} + \alpha_{21}{\vert ab \rangle}
+ \alpha_{22}{\vert bb \rangle}.
\end{aligned}\]
<p>If $P{\vert \Psi \rangle}=\pm{\vert \Psi \rangle}$,
this tells us that</p>
\[\begin{aligned}
\alpha_{11}&=\pm\alpha_{11}\\
\alpha_{12}&=\pm\alpha_{21}\\
\alpha_{22}&=\pm\alpha_{22}.
\end{aligned}\]
<p>Note that if
$P{\vert \Psi \rangle}=-{\vert \Psi \rangle}$ (i.e.
${\vert \Psi \rangle}$ is fermionic), then $\alpha_{11}=-\alpha_{11}$,
which means that $\alpha_{11}=0$, and similarly $\alpha_{22}=0$.
Substituting back in ${\vert \Psi \rangle}$ for the general case, we
obtain:</p>
\[\begin{aligned}
{\vert \Psi \rangle} &= \alpha_{11}{\vert aa \rangle} + \alpha_{12}{\vert ab \rangle} + \alpha_{21}{\vert ba \rangle} + \alpha_{22}{\vert bb \rangle}\\
&= \alpha_{11}{\vert aa \rangle} + \alpha_{12}{\vert ab \rangle} \pm \alpha_{12}{\vert ba \rangle} + \alpha_{22}{\vert bb \rangle}\\
&=\alpha_{11}{\vert aa \rangle} + \alpha_{12}\left({\vert ab \rangle} \pm {\vert ba \rangle}\right) + \alpha_{22}{\vert bb \rangle}\\
&=\frac{\alpha_{11}}{2}({\vert aa \rangle}\pm {\vert aa \rangle}) +
\frac{2\alpha_{12}}{2}\left({\vert ab \rangle} \pm {\vert ba \rangle}\right) +
\frac{\alpha_{22}}{2}({\vert bb \rangle}\pm{\vert bb \rangle}),\qquad (\star)
\end{aligned}\]
<p>and therefore indeed ${\vert \Psi \rangle}$ is a
linear combination of elements of the form</p>
\[\frac{1}{2}({\vert \psi_1\psi_2 \rangle}\pm{\vert \psi_2\psi_1 \rangle}).\]
<p>In particular, we can find bases for the boson and fermion 2-particle
states. From equation $(\star)$ we see that a basis for the boson space
is given by</p>
\[{\vert aa \rangle},\quad \frac{1}{2}({\vert ab \rangle}+{\vert ba \rangle}),\quad{\vert bb \rangle}.\]
<p>and so the boson space has dimension $3$. We can now <em>orthonormalize</em>
this basis, to obtain a (you guessed it), orthonormal basis. It turns
out that these three elements are already orthogonal, and
${\vert aa \rangle},{\vert bb \rangle}$ are already normalized (why?) so
we only need to normalize the remaining one. This means that we have to
calculate its norm:</p>
\[\begin{aligned}
\|({\vert ab \rangle}+{\vert ba \rangle})\|^2 & ={\left\langle ab+ba \middle\vert ab+ba\right\rangle}\\
&=\left({\left\langle ab \middle\vert ab\right\rangle} + {\left\langle ab \middle\vert ba\right\rangle} + {\left\langle ba \middle\vert ab\right\rangle} +{\left\langle ba \middle\vert ba\right\rangle}\right)\\
&=\left({\left\langle a \middle\vert a\right\rangle}{\left\langle b \middle\vert b\right\rangle} + {\left\langle a \middle\vert b\right\rangle}{\left\langle b \middle\vert a\right\rangle} + {\left\langle b \middle\vert a\right\rangle}{\left\langle a \middle\vert b\right\rangle} +{\left\langle b \middle\vert b\right\rangle}{\left\langle a \middle\vert a\right\rangle}\right)\\
&=(1+0+0+1)\\
&=2,
\end{aligned}\]
<p>And this tells us that the normalized state is</p>
<p>\(\frac{1}{\sqrt{2}}({\vert ab \rangle}+{\vert ba \rangle}).\)
We have
now obtained an orthonormal basis for the boson space, which we call the
<strong>occupation number basis</strong>. We define it as</p>
\[\begin{aligned}
{\vert 2,0 \rangle} &:= {\vert aa \rangle}\\
{\vert 1,1 \rangle} &:= \frac{1}{\sqrt{2}}({\vert ab \rangle}+{\vert ba \rangle})\\
{\vert 0,2 \rangle} &:= {\vert bb \rangle}.
\end{aligned}\]
<p>We interpret ${\vert 2,0 \rangle}$ as a state with two
particles in state ${\vert a \rangle}$ and no particles in state
${\vert b \rangle}$, ${\vert 1,1 \rangle}$ as a state with one particle
in each state${\vert a \rangle},{\vert b \rangle}$, and
${\vert 0,2 \rangle}$ as a state with no particles in state
${\vert a \rangle}$ and two particles in state ${\vert b \rangle}$ (see image
below).</p>
<p><img src="/assets/posts/fock-2-state/2-state-boson.png" alt="" /></p>
<p>Similarly, from equation $(\star)$ for fermions, we see that a basis is
given by the single element</p>
\[{\vert ab \rangle} = \frac{1}{2}({\vert ab \rangle}-{\vert ba \rangle}).\]
<p>Therefore we conclude that the fermion space is $1$-dimensional. We can
normalize this state to obtain the <strong>occupation number basis</strong> for
fermions (which in this case is a single, lonely state):</p>
\[{\vert 1,1 \rangle} := \frac{1}{\sqrt{2}}({\vert ab \rangle}-{\vert ba \rangle}).\]
<p>Note that there are <em>no other</em> independent states, and unlike the
bosonic case, we don’t have states that represent two particle in the
same state. For fermions, no two particles are ever in the same state:
this is called <strong>Pauli’s exclusion principle</strong>.</p>
<h1 id="sec:3-particle-states">3-particle states</h1>
<p>Instead of jumping to the general $n$-particle case, we will consider
the 3-particle states. Here, the permutations are much more complicated,
and so a thorough understanding of this case will give us a good
intuition to work on the general case.</p>
<p>Our “distinguishable” 3-particle space is
$\otimes^3 \mathcal{H}= \mathcal{H}\otimes \mathcal{H}\otimes \mathcal{H}$,
which is the vector space of <em>linear combinations</em> of elements of the
form
${\vert \psi_1\psi_2\psi_3 \rangle}={\vert \psi_1 \rangle}\otimes{\vert \psi_2 \rangle}\otimes{\vert \psi_3 \rangle}$
for ${\vert \psi_j \rangle}\in \mathcal{H}$. Following the exact same
procedure as above, we can see that any element of
$\otimes^3 \mathcal{H}$ can be expressed as</p>
\[\begin{aligned}
{\vert \Psi \rangle} &=\phantom{+} \alpha_{111}{\vert aaa \rangle} + \alpha_{112}{\vert aab \rangle} + \alpha_{121}{\vert aba \rangle} + \alpha_{122}{\vert abb \rangle}\\
&\phantom{=} + \alpha_{211}{\vert baa \rangle} + \alpha_{212}{\vert bab \rangle} +
\alpha_{221}{\vert bba \rangle} +\alpha_{222}{\vert bbb \rangle},
\end{aligned}\]
<p>for $\alpha_{ijk}\in {\mathbb{C}}$, and $i,j,k =1,2$.</p>
<h2 id="sec:intr-indist-3">Introducing indistinguishability</h2>
<p>Here’s where things get tricky, since there are many more permutations.
This means that we need to introduce a little bit more notation. Recall
that a <strong>permutation</strong> of $k$ numbers is a function $\sigma$ that
reorders the set $(1,2,\dots,k)$. For example, the function defined as
$\sigma(1)=2$, $\sigma(2)=3$ and $\sigma(3)=1$ is a permutation of
$(1,2,3)$. We may write it more concisely as $\sigma = (2,3,1)$, or in
general, $\sigma = (\sigma(1),\dots,\sigma(k))$ for a permutation of $k$
numbers. The set of all permutations of $k$ numbers is denoted by
$ \mathfrak{S}_k$. It can be shown that it is a group under
composition, and that it has exactly $k!$ elements.</p>
<p>Now in the case of $3$ particles, we need to consider permutations of
$3$ numbers. There are $3!=6$ such permutations, namely:</p>
\[\begin{align*}
(1,2,3) \quad (3,1,2) \quad (2,3,1) \\
(1,3,2) \quad (2,1,3) \quad (3,2,1)
\end{align*}.\]
<p>Let’s recall the natural action of $ \mathfrak{S}_3 $ on
$\otimes^3 \mathcal{H}$. For each $\sigma\in \mathfrak{S}_3 $, define
an operator $P_\sigma$ which acts on <em>product states</em> as</p>
\[P_\sigma{\vert \psi_1\psi_2\psi_3 \rangle}={\vert \psi_{\sigma(1)}\psi_{\sigma(2)}\psi_{\sigma(3)} \rangle}.\]
<p>This action is extended by linearity everywhere else (recall that not
all elements are product states!). As a particular example, choose
$\sigma = (3,1,2)$. So we have</p>
\[P_\sigma{\vert \psi_1\psi_2\psi_3 \rangle}={\vert \psi_{3}\psi_{1}\psi_{2} \rangle}.\]
<p>To be a bit more explicit, suppose
${\vert \Psi \rangle}= {\vert aba \rangle}-2{\vert bba \rangle}$. Then</p>
\[P_\sigma{\vert \Psi \rangle}=P_\sigma{\vert aba \rangle}-2P_\sigma{\vert bba \rangle} = {\vert aab \rangle}-2{\vert abb \rangle}.\]
<p>As before, in order to talk about indistinguishability, we need to
restrict ourselves to a subspace of $\otimes^3 \mathcal{H}$ where the
action of $ \mathfrak{S}_3$ is an <em>absolute unitary symmetry</em>. In the
previous post we showed that the only way is if either</p>
<p>\(P_{\sigma}{\vert \Psi \rangle} = {\vert \Psi \rangle},\)
which is the
bosonic case, or if</p>
\[P_{\sigma}{\vert \Psi \rangle} = (\operatorname{sgn}\sigma){\vert \Psi \rangle},\]
<p>which is the fermionic case.</p>
<p>Same as before, we will find an explicit basis for these bosonic and
fermionic subspaces. In this case the fermionic subspace is quite simple
(perhaps <em>too</em> simple!), so we will start with it. Write
${\vert \Psi \rangle}$ again as</p>
\[\begin{aligned}
{\vert \Psi \rangle} &=\phantom{+} \alpha_{111}{\vert aaa \rangle} + \alpha_{112}{\vert aab \rangle} + \alpha_{121}{\vert aba \rangle} + \alpha_{122}{\vert abb \rangle}\\
&\phantom{=} + \alpha_{211}{\vert baa \rangle} + \alpha_{212}{\vert bab \rangle} +
\alpha_{221}{\vert bba \rangle} +\alpha_{222}{\vert bbb \rangle},
\end{aligned},\]
<p>and suppose that for <em>all</em>
${\sigma \in \mathfrak{S}_3}$,
$P_\sigma{\vert \Psi \rangle}=(\operatorname{sgn}\sigma){\vert \Psi \rangle}$.
The sign of a permutation of $3$ numbers is $1$ if it is a cyclic
permutation of $(1,2,3)$, and $-1$ if it is a cyclic permutation of
$(1,3,2)$, as in below.</p>
<p><img src="/assets/posts/fock-2-state/3-perm-sgn.png" alt="" height="228px" width="600px" />.</p>
<p>For example, consider $\sigma = (1,3,2)$, for which
$\operatorname{sgn}\sigma = -1$. Note that $\sigma$ switches the
elements $2$ and $3$. Then we have</p>
\[\begin{aligned}
P_\sigma{\vert \Psi \rangle} &=\phantom{+} \alpha_{111}{\vert aaa \rangle} + \alpha_{112}{\vert aba \rangle} + \alpha_{121}{\vert aab \rangle} + \alpha_{122}{\vert abb \rangle}\\
&\phantom{=} + \alpha_{211}{\vert baa \rangle} + \alpha_{212}{\vert bba \rangle} +
\alpha_{221}{\vert bab \rangle} +\alpha_{222}{\vert bbb \rangle}.
\end{aligned}\]
<p>However, we also need
$P_\sigma{\vert \Psi \rangle}=-{\vert \Psi \rangle}$, so comparing above
this implies that</p>
\[\begin{aligned}
\alpha_{111} &= -\alpha_{111} & \alpha_{122}&=-\alpha_{122} & \alpha_{211} &=-\alpha_{211} & \alpha_{222} &=-\alpha_{222}\\
& & \alpha_{112} &= -\alpha_{121} & \alpha_{212}&=-\alpha_{221}, & &\end{aligned}\]
<p>and so $\alpha_{111}=\alpha_{122}=\alpha_{211}=\alpha_{222}=0$. Thus we
write ${\vert \Psi \rangle}$ as</p>
\[{\vert \Psi \rangle} = \alpha_{112}({\vert aab \rangle}-{\vert aba \rangle}) + \alpha_{212}({\vert bab \rangle}-{\vert bba \rangle}).\]
<p>Now we consider $\sigma = (3,1,2)$, which is simply a cyclic permutation
of $(1,2,3)$, and so $\operatorname{sgn}\sigma = 1$. We then have</p>
\[P_{\sigma}{\vert \Psi \rangle} = \alpha_{112}({\vert baa \rangle}-{\vert aab \rangle}) + \alpha_{212}({\vert bba \rangle}-{\vert abb \rangle}) ={\vert \Psi \rangle},\]
<p>and this again tells us that $\alpha_{112}=-\alpha_{112}$ and
$\alpha_{212}=-\alpha_{212}$, so $\alpha_{112}=\alpha_{212}=0$.
Therefore ${\vert \Psi \rangle}=0$ (!). This tells us that <strong>there are
no 3-particle fermionic states</strong> if the single-particle space has only
$2$ states! We see this again as a case of <strong>Pauli’s exclusion
principle</strong>: there cannot be two particles in the same state, but we
have $3$ particles that have to fit into only 2 states!</p>
<p>The bosonic case is a little bit more tedious, but let’s get to it.
Again, expand $\Psi$ in terms of the basis vectors, and let’s impose
$P_{\sigma}{\vert \Psi \rangle}={\vert \Psi \rangle}$ for all
$\sigma\in \mathfrak{S}_3$. Let’s consider a small term, for example
$\alpha_{112}{\vert aab \rangle}$. Applying $P_{\sigma}$, we obtain
$\alpha_{112}{\vert baa \rangle}$, but the <em>original</em> coefficient that
goes next to ${\vert baa \rangle}$ in the expansion of
${\vert \Psi \rangle}$ is $\alpha_{211}$, so this tells us that</p>
<p>\(\alpha_{112}=\alpha_{211}.\)
Similarly, when we apply, for example
$\sigma = (1,3,2)$ (as above!) we must have $\alpha_{112}=\alpha_{121}$.
Therefore $\alpha_{112}=\alpha_{211}=\alpha_{121}$. Applying this exact
same analysis, but to the term $\alpha_{221}{\vert bba \rangle}$, we
obtain that $\alpha_{221}=\alpha_{212}=\alpha_{122}$, so that
${\vert \Psi \rangle}$ can be written as:</p>
\[{\vert \Psi \rangle} = \alpha_{111}{\vert aaa \rangle} +\alpha_{112}({\vert aab \rangle}+{\vert aba \rangle}+{\vert baa \rangle}) + \alpha_{221}({\vert bba \rangle}+{\vert bab \rangle} + {\vert abb \rangle}) +\alpha_{222}{\vert bbb \rangle}.\]
<p>This tells is that the subspace of bosonic states is generated by four
vectors, namely:</p>
\[{\vert aaa \rangle}; \qquad {\vert aab \rangle}+{\vert aba \rangle}+{\vert baa \rangle};\qquad {\vert bba \rangle}+{\vert bab \rangle} + {\vert abb \rangle}; \text{ and}\qquad {\vert bbb \rangle}.\]
<p>Note that each of these elements is the sum over all the possible
permutations of one of its terms. For example, the element
${\vert aab \rangle}+{\vert aba \rangle}+{\vert baa \rangle}$ is the sum
over all permutations of ${\vert aab \rangle}$. In the case of
${\vert aaa \rangle}$ and ${\vert bbb \rangle}$, all permutations of
them are themselves again! This is also true for the case of two
particles: each of the basis elements of the bosonic space is the sum
over all permutations of $2$ numbers (of which there are only two!) of
the different basis elements.</p>
<p>Now, similarly as above, we normalize these four vectors (they are
already orthogonal, why?), and obtain an orthonormal basis</p>
\[\begin{aligned}
{\vert 3,0 \rangle} &:= {\vert aaa \rangle},\\
{\vert 2,1 \rangle} &:= \frac{1}{\sqrt{3}}\left({\vert aab \rangle}+{\vert aba \rangle}+{\vert baa \rangle}\right),\\
{\vert 1,2 \rangle} &:= \frac{1}{\sqrt{3}}\left({\vert bba \rangle}+{\vert bab \rangle} + {\vert abb \rangle}\right),\\
{\vert 0,3 \rangle} &:= {\vert bbb \rangle}.\end{aligned}\]
<p>Once
again, we interpret ${\vert 3,0 \rangle}$ as a state where there are
$3$-particles in state ${\vert a \rangle}$ and no particles in
${\vert b \rangle}$; we interpret ${\vert 2,1 \rangle}$ as a state with
$2$ particles in state ${\vert a \rangle}$ and one in state
${\vert b \rangle}$, and so on.</p>
<p><img src="/assets/posts/fock-2-state/3-state-boson.png" alt="" /></p>
<p>Once again, note that these <em>are not the only states</em>. Any linear
combination of
${\vert 3,0 \rangle},{\vert 2,1 \rangle},{\vert 1,2 \rangle},{\vert 0,3 \rangle}$
is again a valid bosonic state!</p>
<p>These examples should now give us a good foothold for…</p>
<h1 id="sec:general-case">The general case</h1>
<p>Now we will consider a general Hilbert space $ \mathcal{H}$ that allows
for a countable basis. We will call the tensor product
$\otimes^{k} \mathcal{H}$ the $k$-particle space <em>without statistics</em>, since it does not have indistinguishability implemented yet.</p>
<h2 id="sec:symm-or-boson">Symmetrization and antisymmetrization (or bosonization and fermonization, if you will)</h2>
<p>Our previous analysis gives us a suggestion: on
$\otimes^k \mathcal{H}$, define the <strong><em>symmetrization operator</em></strong>
${\pi_{+}}:\otimes^{k} \mathcal{H}\to \mathcal{S}^k \mathcal{H}$ as</p>
\[{\pi_{+}}{\vert \Psi \rangle} = \frac{1}{k!}\sum_{\sigma\in \mathfrak{S}_k}P_{\sigma}{\vert \Psi \rangle}.\]
<p>This operator returns the sum over all possible permutations of
${\vert \Psi \rangle}$, and we add the $1/k!$ to counteract some
overcounting. For example, if we apply $\pi_+$ to ${\vert aaa \rangle}$,
we have that $P_{\sigma}{\vert aaa \rangle}={\vert aaa \rangle}$ for all
$\sigma$, and therefore we are repeating the sum over the same element
$3!=6$ times, so the $1/3!$ gets rid of that. Our previous analysis
suggests that for <em>any</em> element
${\vert \Psi \rangle}\in \otimes^k \mathcal{H}$, its symmetrization is
a bosonic state, i.e.
$\pi_+{\vert \Psi \rangle}\in \mathcal{S}^{k} \mathcal{H}$, since any
bosonic state can be written as a linear combination of elements of the
form $\pi_+{\vert \Psi \rangle}$ (at least, we showed this for $k=2$ and
$k=3$ in the case $ \mathcal{H}\cong {\mathbb{C}}^2$ is a $2$-state
system). This means that for all $\sigma’\in \mathfrak{S}_k$, applying
$P_{\sigma’}$ to a symmetrized state should yield the same state:</p>
\[\begin{aligned}
P_{\sigma'}({\pi_{+}}{\vert \Psi \rangle}) &= \frac{1}{k!}\sum_{\sigma\in \mathfrak{S}_k}P_{\sigma'}P_{\sigma}{\vert \Psi \rangle}\\
&= \frac{1}{k!}\sum_{\sigma\in \mathfrak{S}_k}P_{\sigma' \sigma}{\vert \Psi \rangle}\\
&= \frac{1}{k!}\sum_{\sigma\in \mathfrak{S}_k}P_{\sigma}{\vert \Psi \rangle}\\
&= {\pi_{+}}{\vert \Psi \rangle}.
\end{aligned}\]
<p>The third equality follows from the fact that the set
$\left\{ \sigma’\sigma\vert \sigma\in \mathfrak{S}_k \right\}$ is
precisely $ \mathfrak{S}_k$. This is true since for fixed $\sigma’$,
since $\sigma = \sigma’(\sigma’^{-1}\sigma)$, every element in
$ \mathfrak{S}_k$ is of the form $\sigma’\sigma$. Then
${\pi_{+}}{\vert \Psi \rangle}\in \mathcal{S}^k \mathcal{H}$, as
promised.</p>
<p>Similarly, we can define the <strong>antisymmetrization operator</strong>
$\pi_-:\otimes^{k} \mathcal{H}\to \Lambda^{k} \mathcal{H}$ as</p>
\[\pi_-{\vert \Psi \rangle} = \frac{1}{k!}\sum_{\sigma\in \mathfrak{S}_k}\operatorname{sgn}(\sigma)P_{\sigma}{\vert \Psi \rangle}.\]
<p>Indeed, applying $\pi_-$ to a vector ${\vert \Psi \rangle}$ returns an
antisymmetric state (i.e. a fermionic state). This means that applying a
permutation operator $P_{\sigma’}$ to an antisymmetrized state should
yield the same state times the sign of the permutation:</p>
\[\begin{aligned}
P_{\sigma'}({\pi_{-}}{\vert \Psi \rangle}) &= \frac{1}{k!}\sum_{\sigma\in \mathfrak{S}_k}\operatorname{sgn}(\sigma)P_{\sigma'}P_{\sigma}{\vert \Psi \rangle}\\
&= \frac{1}{k!}\sum_{\sigma\in \mathfrak{S}_k}\operatorname{sgn}(\sigma)P_{\sigma' \sigma}{\vert \Psi \rangle}\\
&= \frac{1}{k!}\sum_{\sigma\in \mathfrak{S}_k}\operatorname{sgn}({\sigma'}^{-1}\sigma)P_{\sigma}{\vert \Psi \rangle}\quad(\star)\\
&= \frac{1}{k!}\sum_{\sigma\in \mathfrak{S}_k}\operatorname{sgn}(\sigma')\operatorname{sgn}(\sigma)P_{\sigma}{\vert \Psi \rangle} \\
&= \frac{\operatorname{sgn}{\sigma'}}{k!}\sum_{\sigma\in \mathfrak{S}_k}\operatorname{sgn}(\sigma')\operatorname{sgn}(\sigma)P_{\sigma}{\vert \Psi \rangle}\\
&=\operatorname{sgn}(\sigma'){\pi_{-}}{\vert \Psi \rangle}.
\end{aligned}\]
<p>The passing to equation $(\star)$ follows from
performing a “change of variables” $\sigma\to \sigma’^{-1}\sigma$. This
change of variables is legitimate since, again
$\left\{ \sigma’^{-1}\sigma\vert\sigma\in \mathfrak{S}_k \right\}= \mathfrak{S}_k$.</p>
<p>Then, as expected, applying the (anti)symmetrization operator on a
vector returns an element of the (anti)symmetric space. But even more,
these (anti)symmetrization operators <em>precisely</em> define the bosonic and
fermionic spaces; that is,</p>
\[\begin{aligned}
S^{k} \mathcal{H} &= {\pi_{+}}(\otimes^k \mathcal{H}).\\
\Lambda^{k} \mathcal{H} &= {\pi_{-}}(\otimes^k \mathcal{H}).
\end{aligned}\]
<p>Which is to say, every (anti)symmetric vector is the
(anti)symmetrization of some other vector. One of the inclusions we
already proved, the other follows from noting, a little bit trivially,
that if ${\vert \Psi \rangle}\in \mathcal{S}^k \mathcal{H}$, then</p>
\[\begin{aligned}
{\pi_{+}}{\vert \Psi \rangle} &=\frac{1}{k!}\sum_{\sigma\in \mathfrak{S}_k}P_\sigma{\vert \Psi \rangle}\\
&= \frac{1}{k!}\sum_{\sigma\in \mathfrak{S}_k}{\vert \Psi \rangle}\\
&={\vert \Psi \rangle},
\end{aligned}\]
<p>and similarly for $\Lambda^k \mathcal{H}$. The
antisymmetrization operator (and in general the fermion space) has a few
caveats, though. Note that above we saw that the fermion space of $3$
particles for a $2$-state system is… problematic. We can’t fit 3
Pauli-excluding particles in $2$ states, so the whole thing collapses to
zero. This is in fact a general feature of fermionic spaces of
<em>finite</em>-dimensional Hilbert spaces.</p>
<p>Specifically, if $ \mathcal{H}$ is $n$-dimensional, then for all $k>n$,
we have $\Lambda^k \mathcal{H}=\left\{ 0 \right\}$.</p>
<p>To see this consider the case $k=n+1$. Consider an orthonormal basis
$\left\{ {\vert \xi_1 \rangle},\dots,{\vert \xi_n \rangle} \right\}$ of
$ \mathcal{H}$. Now let’s try to take $\Lambda^{n+1} \mathcal{H}$. As
we saw above, any element of $\Lambda^{n+1} \mathcal{H}$ is the
antisymmetrization of something else, so let’s take and element of the
natural basis of $\otimes^{n+1} \mathcal{H}$. This element is of the
form
\({\vert \xi_{i_1}\dots\xi_{i_{n+1}} \rangle}.\)
However, since
there are more “slots” than vectors that we can fill them with ($k>n$),
there must be <em>some</em> repeated elements. For example, we might take</p>
\[{\vert \Psi \rangle}={\vert \xi_1\xi_1\xi_2\xi_3\dots\xi_n \rangle},\]
<p>where $\xi_1$ is repeated so that it fills up all the slots. When take
the antisymmetrization of ${\vert \Psi \rangle}$, something happens
which is illustrated with the following example: consider the identity
permutation $\sigma=(1,2,3,\dots,n)$, which <em>does nothing</em> and the one
that transposes the first two elements $\sigma’=(2,1,3,4,\dots,n)$.
These two differ by a transposition, so
$\operatorname{sgn}(\sigma)=-\operatorname{sgn}(\sigma’)$, and therefore
when we add everything we will obtain two terms</p>
\[\operatorname{sgn}(\sigma)P_{\sigma}{\vert \xi_1\xi_1\xi_2\xi_3\dots\xi_n \rangle} + \operatorname{sgn}(\sigma')P_{\sigma'}{\vert \xi_1\xi_1\xi_2\xi_3\dots\xi_n \rangle}= {\vert \xi_1\xi_1\xi_2\xi_3\dots\xi_n \rangle} - {\vert \xi_1\xi_1\xi_2\xi_3\dots\xi_n \rangle} = 0.\]
<p>The action of $\sigma’$ on ${\vert \Psi \rangle}$ is exactly the same as
the one of $\sigma$, precisely because of the repeated elements. This
tells us, more generally, that</p>
\[\pi_-{\vert \psi_1\dots \xi\dots \xi\dots \psi_{n} \rangle} = 0,\]
<p>whenever there are repeated elements. This, as above, happens because
for every permutation $\sigma$, there is another permutation $\sigma’$
which differs from $\sigma$ by only one transposition, which precisely
transposes the places where the repeated elements land. Thus, everything
collapses to zero.</p>
<p>This is again Pauli’s exclusion principle: If you have $n$ states but
$k>n$ particles that cannot share the same state, you cannot fit them
all!</p>
<h1 id="the-fock-space">The Fock space</h1>
<p>Now that we have the spaces for $k$ identical particles, we might want
to extend it to hold <em>infinitely many</em>!</p>
<p>But first, let’s first consider a space that allows for arbitrarily many
particles <em>without statistics</em>. This can be done by defining the <strong>Fock
space</strong> (without statistics) $ \mathcal{F}( \mathcal{H})$ as the
direct sum</p>
\[\mathcal{F}( \mathcal{H}) = \bigoplus_{k=0}^{\infty}\otimes^k \mathcal{H},\]
<p>where the “$0$-th level” is $\otimes^0 \mathcal{H}:= {\mathbb{C}}$. To
be clear, this is the <em>analytic</em> direct sum, i.e. the vector space of
sequences $(a_0,a_1,\dots)$, with $a_k\in \otimes^k \mathcal{H}$, such
that $\sum_{k=0}^\infty|a_k|^2<\infty$. We also might write the
sequence $(a_0,a_1,\dots)$ as a sum $\sum_{k=0}^\infty a_k$, again with
each $a_k\in \otimes^k \mathcal{H}$. This is a <em>direct sum</em>, so the
elements of different number $k$ do not interact with one another.</p>
<p>The inner product on $ \mathcal{F}( \mathcal{H})$ is defined as</p>
\[{\left\langle a \middle\vert b\right\rangle}:=\sum_{k=0}^\infty{\left\langle a_k \middle\vert b_k\right\rangle},\]
<p>thus forming the Hilbert space<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> that we want.</p>
<p>In this Fock space, define the <strong>vacuum</strong> state, denoted by
${\vert 0 \rangle}$ or ${\vert \Omega \rangle}$, as</p>
\[{\vert 0 \rangle}={\vert 0,0,\dots \rangle}:=1\in \otimes^0 \mathcal{H}={\mathbb{C}}.\]
<p>This represents a state with <em>no particles at all</em>. Do not confuse the
ground state ${\vert 0 \rangle}$ with the zero-vector
$0\in \mathcal{H}$ or the element $0\in {\mathbb{C}}$!</p>
<p>Similarly, we can consider the <strong>boson and fermion fock spaces</strong>,
$ \mathcal{F}_+( \mathcal{H})$ and $ \mathcal{F}_-( \mathcal{H})$,
respectively, as</p>
\[\begin{aligned}
\mathcal{F}_+( \mathcal{H}) &:= \bigoplus_{k=0}^\infty \mathcal{S}^k \mathcal{H}\\
\mathcal{F}_-( \mathcal{H}) &:= \bigoplus_{k=0}^\infty\Lambda^k \mathcal{H}.\end{aligned}\]
<p>Since this might be quite too abstract, let’s give examples of elements
of each space. Suppose that $ \mathcal{H}$ has an infinite countable
basis $\left\{ {\vert \xi_n \rangle} \right\}_{n=0}^{\infty}$ (these may
be, for example, energy eigenstates for a good Hamiltonian, like the
harmonic oscillator or the infinite square potential). We take an
infinite-dimensional space to avoid the collapse of the fermionic spaces
that we described above.</p>
<p>The simplest elements of these spaces are simply <em>finite</em> sums of
elements of each component space. This guarantees the sum of the squared
norms to be convergent.</p>
<p>For a non-trivial element of $ \mathcal{F}( \mathcal{H})$ which is
neither bosonic nor fermionic, consider</p>
\[{\vert \Psi \rangle} = \sum_{n=0}^\infty\frac{1}{2^n}{\vert \xi_1\dots\xi_n \rangle} = {\vert 0 \rangle} + \frac{1}{2}{\vert \xi_1 \rangle}+\frac{1}{4}{\vert \xi_1\xi_2 \rangle} + \dots\]
<p>Indeed, the $n$-th term of this (formal) sum is
$a_n=2^{-n}{\vert \xi_1\dots\xi_n \rangle}\in\otimes^n \mathcal{H}$,
and it satisfies</p>
\[\sum_{n=0}^\infty \|a_k\|^2 = \sum_{n=0}^\infty\frac{1}{2^{2n}}{\left\langle \xi_1\dots\xi_k \middle\vert \xi_1\dots\xi_k\right\rangle} = \sum_{k=0}^{\infty}\frac{1}{2^{2n}} <\infty.\]
<p>Now for a nontrivial bosonic element, we can simply consider</p>
\[\sum_{n=0}^{\infty}\frac{1}{2^n}\pi_+{\vert \xi_1\dots\xi_n \rangle} ={\vert 0 \rangle} +\frac{1}{2}{\vert \xi_1 \rangle}+ \frac{1}{4\cdot 2!}({\vert \xi_1\xi_2 \rangle}+{\vert \xi_2\xi_1 \rangle}) + \frac{1}{8\cdot 3!}({\vert \xi_1\xi_2\xi_3 \rangle} + \dots + {\vert \xi_3\xi_2\xi_1 \rangle}) +\dots.\]
<p>Similarly, for a fermionic element,</p>
\[\sum_{n=0}^{\infty}\frac{1}{2^n}\pi_-{\vert \xi_1\dots\xi_n \rangle} ={\vert 0 \rangle} +\frac{1}{2}{\vert \xi_1 \rangle}+ \frac{1}{4\cdot 2!}({\vert \xi_1\xi_2 \rangle}-{\vert \xi_2\xi_1 \rangle}) + \frac{1}{8\cdot 3!}({\vert \xi_1\xi_2\xi_3 \rangle} + \dots - {\vert \xi_3\xi_2\xi_1 \rangle}) +\dots.\]
<p>Note that these examples are not normalized. We may, however, normalize
them since their norms are finite.</p>
<p>These examples might look contrived, but they are general elements of
the total Fock spaces. However, physicists use Fock spaces in terms of a
specific basis that is induced by a basis of $\mathcal{H}$. This
induced basis is a generalization (and completion) of the bases
${\vert 2,0 \rangle},{\vert 1,1 \rangle},{\vert 0,2 \rangle}$, etc.,
that we found above, and is sometimes called the <strong>occupation number
basis</strong> or <strong>occupation number representation</strong>. The occupation number
basis makes it easier to solve the counting problems that are needed for
statistical mechanics, and we will discuss it in depth in a future post.</p>
<h1 id="sec:summary">In summary</h1>
<p>We presented bosonization and fermionization operators that act on the
spaces of $k$ <em>distinguishable</em> particles and output bosonic or
fermionic states. We also saw that Pauli’s exclusion principle is
general: fermionic particles can never share the same state, and this
means that if the dimension of the one-particle state space is finite
(say of dimension $n$), then the fermionic space of $k>n$ particles is
<em>meaningless</em>.</p>
<p>We also constructed (algebraically) a space that can represent states
with an <em>arbitrary</em> number of particles, in the bosonic, fermionic, and
without statistics flavors.</p>
<p>What comes next is to describe <em>how to work</em> and do interesting stuff
with the Fock spaces, and hopefully this fun will take us all the way to
the idea of a <em>quantum field</em>.</p>
<h1 id="sec:references">References</h1>
<ul>
<li>Folland, G. B. (2008). <em>Quantum Field Theory: A Tourist Guide for
Mathematicians</em>, Chapter 4.</li>
<li>Dimock, J. (2011). <em>Quantum Mechanics and Quantum Field Theory: A Mathematical Primer</em>, Chapter 5.</li>
</ul>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>There are a few technical details and subtleties when the
one-particle state spaces are infinite dimensional. In the good
physicist fashion, we will simply assume that everything behaves as
in finite dimensions. Who likes analysis, anyway? <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Santiago Quintero de los RíosHere we will construct state spaces with an *arbitrary number of particles*, instead of a *fixed* number of particles. We first focus on a 2-state system, for example polarization of photons or spin-(1/2).Breaking the rules: The physicist’s flexible mathematics2018-12-02T00:00:00+01:002018-12-02T00:00:00+01:00http://homotopico.com/honest-physics/2018/12/02/breaking-the-rules<p>If you look at the curriculum of any respectable Physics program, you’ll find mathematics courses like calculus, linear algebra, differential equations, maybe some complex analysis and differential geometry. It seems at a first glance that physicists use the <strong>results</strong> of mathematical theories (e.g. theorems from calculus, linear algebra, functional analysis, etc.), under certain “physical” interpretations of the objects that are used therein (e.g. forces are vector fields, trajectories are curves, quantum states are elements of a Hilbert space), to arrive at meaningful expressions concerning physical entities. That is, there is some kind of dictionary that tells you: this mathematical object represents this physical entity.</p>
<p>This is especially the case in classical physics, where, for example, it can be seen that the generalized coordinates that are used in (holonomic) systems correspond to coordinates on a specific manifold, and time-evolution corresponds to moving on a curve through that manifold. There is a clear dictionary. Curves to particles, vector fields to forces. Even in non-relativistic quantum mechanics one can find these kinds of dictionaries. It gets to the point where one can state this dictionary quite explicitly: Many a book on “quantum mechanics for mathematicians” coldly states that quantum mechanics is <em>simply</em> a representation of the Heisenberg algebra on a projective separable complex Hilbert space (uh… sure.).</p>
<p>But <em>physicists don’t give a damn</em> about what the Hilbert space is (or is it even a Hilbert space), and they don’t give a damn about flows on manifolds that give rise to Hamilton’s equations or the probability measures that describe the canonical ensemble. They don’t care about the dictionary: only mathematicians do, because it gives them confidence that the “physics” they are doing are well-founded.</p>
<p>Now, it is clear at the outset that physicists think of and use mathematics very differently from mathematicians (otherwise physicists would simply be called “mathematicians”). The physicist’s goal is not necessarily to prove abstract theorems about the objects she is dealing with, but rather to say something meaningful about the physical entities that her mathematical objects represent. So it is not necessary to drown in details of the mathematics involved. Physicists <em>use</em> mathematics.</p>
<p>Kind of. It seems that they do not use the same “flavor” (if you want to call it that, and I do) of mathematics that the mathematicians do. In physics, one does not care about the nuances of the axiomatic system one works with. As Folland put it in his QFT book (second ed. pp 47-48):</p>
<blockquote>
<p>Mathematicians are trained to think that all mathematical objects should be realized in some specific way as sets. When we do calculus on the real line we may not wish to think of real numbers as Dedekind cuts or equivalence classes of Cauchy sequences of rationals, but it reassures us to know that we can do so. Physicists’ thought processes, on the other hand, are anchored in the physical world rather than in set theory, and they see no need to tie themselves down to specific set-theoretic models.</p>
</blockquote>
<p>He then goes to explain that phycisists work much more symbolically. It is as if they are working with an logical system (first order, second order, I don’t know) where there is only syntax but no semantics. That is, we only consider the relationships between the symbols, with disregard for whether those symbols with those specific relations can be realized by a specific set. We introduce the notion of functions, vectors, tensors, whatevers, and the notion of addition, multiplication, derivation, integration, etc., purely symbolically: if you have these symbols, you can arrange them in this order, and if you put this integration symbol there, you can exchange the whole lot by this other sets of symbols.</p>
<p>So, physicists use symbols with syntax but not semantics? Not exactly, because the mathematical objects they are dealing with often correspond to physical (“””"”real””””, in many quotation marks) entities. And hence why we do not accept certain solutions to equations in the grounds of “physicality” (oh, for certain values of mass the height of this parabola is imaginary? Cross it out). These physicality assumptions end up being mathematical conditions on the objects we deal with. For example, the “measurable” observables should all be real numbers, and this implies that observables in quantum mechanics are represented by <em>Hermitian</em> operators. Then one may think that these “physicality conditions” can be simply added as axioms of the theory (e.g. add a “for all x, x is observable implies x is real”).</p>
<p>However, this implicitly states that (at least within the realm of a particular physical theory) there is an axiomatic system under which a physicist works. But upon further inspection, this does not seem to hold very well. First, the different areas of physics are not as well-delineated as those in mathematics, so it is unclear at the outset what this axiomatic system would be. But even if the physicist assumes she is working in an explicit framework (say, for example, quantum mechanics), she might use results and ideas and objects from another framework that might not <em>a priori</em> fit well within the first. This often requires reinterpretation (oh, yeah X is discrete and I want to interpret it as Y, which is not, so let’s say that X is a sampling or approximation of Y), but in some cases even the reintrepretation is inconsistent with the original framework. Furthermore, the physicist is willing to <strong>break the mathematical rules</strong> under which she is working, in order to press onward with her theory.</p>
<p><em>Physicists are willing to break the rules of the game in order to keep the game going.</em></p>
<p>My favorite example of this is Dirac’s development of an equation for relativistic quantum mechanics. To be brief, Dirac’s idea was to find the <em>square root</em> of the Klein-Gordon operator. His motivations are not important. What is important is that the Klein-Gordon operator is a differential operator that acts on real (or complex) functions of R^4, what we call <em>scalar</em> functions, and it turns out that it is impossible to find such a square root.</p>
<p>That is, if you work constrained to your axiomatic system of differential operators acting on scalar functions. Dirac wasn’t feeling constrained. So he did what physicists do: Who gives a damn about rules? He wrote down an object with some free parameters whose square would be the Klein-Gordon operator, and tried to figure out what those parameters needed to be. The problem is that those free parameters were apparently inconsistent: they needed to be anti-commuting: ab=-ba.</p>
<p>That doesn’t happen for complex numbers. Then Dirac realized that he could still find his square root if he allowed for <em>matrix</em>-valued fields, instead of scalar-valued fields, so that the anti-commutativity of the parameters could make sense. So Dirac broke the rules, but in a good way. If he had stopped at “welp, complex numbers are not anticommutative”, it would’ve been the end of it. No relativistic quantum mechanics (at least not as we know it, possibly).</p>
<p>Physicists are willing to break the rules if it suits them, but they otherwise play by the rules of the game, the game being calculus, algebra, etc. Those rules can, and will be, overruled if necessary. But <strong>not</strong> all of them, only… most of them. There are some absolutely unbreakable rules in physics, mostly regarding causality and thermodynamics. It’s important to note that mathematicians also break the rules, but they are much more careful and systematic about it. A theorem is proven under certain assumptions, but sometimes it is interesting to see what happens if the assumptions are relaxed. Similarly, it is interesting to see what happens when the axioms and fundamental definitions of a theory are relaxed. That is how we get complex numbers, and… well… everything else that is weird in mathematics.</p>
<p>Physicists, in the end, are willing to do <em>almost</em> anything (and given some fun stuff in high energy physics I’m tempted to remove the “almost”), as long as one obtains a meaningful statement about physical reality. They use a Machiavellian flavor of mathematics, where the ends justify the means. How valid is this? Well, it has mostly worked. We have computers, and the internet, and smartphones, and robots, and nuclear weapons, and nuclear-powered robots on Mars. There is something that is <em>right</em> about this approach to physics, about this flexible use of mathematics. It seems that mathematics, and I mean precise, rigorous mathematics, is more of a <em>guide</em> than a fixed toolset with fixed rules. And that is entirely okay, as long as we’re honest about it.</p>Santiago Quintero de los RíosWhat kind of mathematics do physicists use? It's a flexible kind of mathematics, where one can break the rules of the game in order to keep the game going.Multiparticle spaces in quantum mechanics, part 1: Indistinguishability2018-10-11T00:00:00+02:002018-10-11T00:00:00+02:00http://homotopico.com/multiparticle%20states/2018/10/11/multiparticle-states<p>Get this post <a href="/assets/docs/pdf_posts/multiparticle-states.pdf">in PDF format here.</a></p>
<p>Suppose you already know how to treat one-particle quantum-mechanical
systems, and now you wish to go big, go statistical, or maybe go to
fields. It is therefore necessary to understand systems with multiple
(identical) particles, and how to represent them mathematically.</p>
<p>The fundamental hypothesis here is that <strong>particles of the same kind are
indistinguishable</strong>. The question of how reasonable this hypothesis is
is <a href="https://youtu.be/aIKPqxLxXTY?t=3">“a good question… for another
time”</a>, so very much like
The Force Awakens’ writers, I will put a shade on that lamp, ignore the
issue, and move on <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. In the end, we should not say that “particle
$a$ is in state $x$ and particle $b$ is in state $y$”; at most we should
say “there are two particles, one in state $x$ and one in state $y$”.
How do we go about describing these systems mathematically, given that
we already know how to describe a one-particle system? It seems
cumbersome to try and work out how to mathematically write the second
statement, whereas the first one seems simpler if we already know how to
state that “a particle is in state $x$” and “a particle is in state
$y$”.</p>
<p>Then what we will do is write “particle $a$ is in state $x$ and particle
$b$ is in state $y$” mathematically, and <em>then</em> find out how to turn
this statement into the indistinguishable case “there are two particles,
one in state $x$ and one in state $y$”. We will see that the
indistinguishable case can be thought of as a “distinguishable case” but
with <em>permutation symmetry</em>.</p>
<p>This means that we first have to discuss a little bit what a symmetry
is, and then apply this to the permutation symmetry of multiparticle
states. We will obtain that there are <strong>two types</strong> of
indistinguishability, one of which naturally takes us to Pauli’s
exclusion principle.</p>
<h1 id="symmetrieszip">Symmetries.zip</h1>
<p>Consider an observable $A$ of a system $S$ (this system does not even
have to be quantum). Suppose that we have two states $\psi_1,\psi_2$ of
this system that are <em>identical</em> if we <em>only</em> compare how they “look”
with the $A$ observable (for example, if the observable $A$ is “distance
to a point $o$”, then all the points in a circle centered at $o$ are
identical for anyone who can only check the $A$ observable). What we
mean is that the probability distribution of $A$ in states $\psi_1$ and
$\psi_2$ are the same. Under this observable, the states $\psi_1$ and
$\psi_2$ are identical. In this case we say that $\psi_1$ and $\psi_2$
are $A$-equivalent or $A$-symmetric (or symmetric with respect to $A$).
There might be some other observable $B$ which can distinguish between
$\psi_1$ and $\psi_2$, or otherwise the states $\psi_1$ and $\psi_2$
will, by definition, be exactly the same.</p>
<p>In the context of quantum mechanics, we say that two states
${\left\vert \psi_1 \right\rangle}, {\left\vert \psi_2 \right\rangle}$
are $A$-symmetric if the probability distribution of the observable $A$
is the same for both. That is, they are $A$-symmetric if and only if for
all possible values $a$ of $A$, the cumulative distributions are</p>
\[P\left(A\leq a\middle\vert\psi_1\right)=P\left(A\leq a\middle\vert \psi_2\right).\]
<p>Recall then that the probability distribution of an observable $A$ is
given by the eigenvalues of an associated Hermitian operator $\hat{A}$.
Namely, if
${\left\vert a_1 \right\rangle},{\left\vert a_2 \right\rangle},\dots$
are orthonormal eigenstates with eigenvalues $a_1,a_2,\dots$, then the
probability distribution of $A$ given a state $\psi$ is given by<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
\[P(A=a_j\vert \psi) = \left\vert{\left\langle a_j \middle\vert \psi\right\rangle}\right\vert^2.\]
<p>This implies that two states
${\left\vert \psi_1 \right\rangle},{\left\vert \psi_2 \right\rangle}$
are $A$-symmetric if and only if for all eigenvalues $a_j$ of $\hat{A}$,</p>
\[\left\vert{\left\langle a_j \middle\vert \psi_1\right\rangle}\right\vert^2 = \left\vert{\left\langle a_j \middle\vert \psi_2\right\rangle}\right\vert^2.\]
<p>We say that a mapping $T:\mathcal{H}\to\mathcal{H}$ is $A$-symmetric (or
an $A$-symmetry) if for all
${\left\vert \psi \right\rangle}\in \mathcal{H}$, the states
${\left\vert \psi \right\rangle}$ and
${\left\vert \psi’ \right\rangle}=T{\left\vert \psi \right\rangle}$ are
$A$-symmetric. This means that for all eigenvalues $a_j$ of $A$,</p>
\[\left\vert{\left\langle a_j \middle\vert T\middle|\psi\right\rangle}\right\vert^2 = \left\vert{\left\langle a_j \middle\vert \psi\right\rangle}\right\vert^2.\]
<p>This immediately implies that the expectation value of $A$ is the same
for ${\left\vert \psi \right\rangle}$ and
${\left\vert \psi’ \right\rangle}$ (as can be seen by inserting some
well-placed $I$ using the completeness relation for the
${\left\vert a_j \right\rangle}$ eigenstates).</p>
<p>In particular, if $T$ is an $A$-symmetry, then for all
${\left\vert \psi \right\rangle}$ and
${\left\vert \psi’ \right\rangle}=T{\left\vert \psi \right\rangle}$ we
have that</p>
\[\begin{aligned}
\|\psi'\|^2=\left\vert{\left\langle \psi' \middle\vert \psi'\right\rangle}\right\vert^2 &= \sum_j\left\vert{\left\langle a_j \middle\vert \psi'\right\rangle}\right\vert^2\\
&= \sum_j\left\vert{\left\langle a_j \middle\vert \psi\right\rangle}\right\vert^2\\
&= \left\vert{\left\langle \psi \middle\vert \psi\right\rangle}\right\vert^2 = \|\psi\|^2.
\end{aligned}\]
<p>And thus, $T$ is <em>norm</em>-preserving. Unfortunately,
this fact alone does not say much about $T$. However, if we also require
$T$ to be invertible and linear, then it must be a unitary
transformation, that is, $T^\dagger T = T T^\dagger = I$, or
equivalently
${\left\langle (T\psi) \middle\vert T\phi\right\rangle}={\left\langle \psi \middle\vert \phi\right\rangle}$.
This follows from a straight-forward application of the polarization
identity</p>
\[{\left\langle u \middle\vert v\right\rangle} = \frac{1}{4}\left(\|u+v\|^2 - \|u-v\|^2 +i\|u-iv\|^2 -i \|u+iv\|^2\right).\]
<p>Similarly, if we require $T$ to be invertible and additive but
<em>anti</em>linear, $T(au + v)= a^*T(u)+T(v)$, then if it preserves norms it
must be antiunitary, i.e.
${\left\langle (T\psi) \middle\vert T\phi\right\rangle}={\left\langle \phi \middle\vert \psi\right\rangle}$.</p>
<p>These results are quite similar to Wigner’s theorem, which states that
any invertible mapping $T$ that preserves the <em>norms</em> of inner products
and that maps <em>rays</em> into rays (i.e. if $v=\alpha u$ for some scalar
$\alpha$ then there exists a scalar $\beta$ such that
$T(v)=\beta T(u)$), (and whose inverse does too) must be unitary or
antiunitary.</p>
<p>What we have shown is that any <em>linear</em> $A$-symmetry must be a unitary
transformation. In particular, if $\hat{U}$ is unitary $A$-symmetry,
then the $\hat{U}$-conjugate of $\hat{A}$ satisfies</p>
\[\begin{aligned}
{\left\langle \psi \middle\vert \hat{U}^{-1}\hat{A}\hat{U}\middle\vert\psi\right\rangle} &= {\left\langle \psi \middle\vert \hat{U}^{\dagger}\hat{A}\hat{U}\middle\vert\psi\right\rangle}\\
&={\left\langle (\hat{U}\psi) \middle\vert \hat{A}\middle\vert(\hat{U}\psi)\right\rangle}\\
&= {\left\langle \psi \middle\vert \hat{A}\middle\vert\psi\right\rangle}.
\end{aligned}\]
<p>This is because $\hat{U}^{\dagger}=\hat{U}^{-1}$, and
since $\hat{U}$ is an $A$-symmetry, it preserves the expectation value
of $A$. This holds for any
${\left\vert \psi \right\rangle}\in \mathcal{H}$, therefore
$\hat{U}^{-1}\hat{A}\hat{U}=\hat{A}$, which means that
$\hat{A}\hat{U}=\hat{U}\hat{A}$, or equivalently, $[\hat{A},\hat{U}]=0$.</p>
<p>There are different levels on which we can describe symmetry. The first
one we discussed is determined <em>pointwise</em>, that is, only two states
being symmetric with respect to an observable. Then, one step further,
we described transformations $T$ that are also $A$-symmetries. Under
these transformations, every pair
${\left\vert \psi \right\rangle},T{\left\vert \psi \right\rangle}$ are
$A$-symmetric. If this operator is to be linear, then it must also be
unitary. Most often, in the standard QM references, the symmetries that
are considered are unitary transformations that are symmetric with
respect to the Hamiltonian (energy) observable.</p>
<p>We can go one step further and consider symmetries with respect to <em>all</em>
observables, which we call <strong>absolute symmetries</strong><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. Any two states
that are related by these symmetries are <em>absolutely indistinguishable</em>.
What this means is that if $T$ is an absolute symmetry and
${\left\vert \psi \right\rangle}=T{\left\vert \phi \right\rangle}$, then
${\left\vert \psi \right\rangle}$ and ${\left\vert \phi \right\rangle}$
have the exact same distributions for <em>all observables</em>. This means that
they are indistinguishable!<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></p>
<p>Of course, if our system allows for these kinds of absolute symmetries,
then <em>our state space is defined with redundancies</em>, since we <em>should</em>
define the states and the observables in such a way that two states are
equal if and only if they have the same distribution for all
observables. In truth, absolute symmetries arise from being sloppy in
the definition of states.</p>
<p>We will now attempt to characterize absolute symmetries. So let $T$ be
an absolute symmetry. Consider any state
${\left\vert \psi \right\rangle}$, and complete it to an orthonormal
basis
${\left\vert \psi \right\rangle}={\left\vert \psi_1 \right\rangle},{\left\vert \psi_2 \right\rangle},{\left\vert \psi_3 \right\rangle},\dots$.
Since $T$ is an absolute symmetry, for all Hermitian $\hat{A}$, we have
that
${\left\langle (T\psi) \middle\vert \hat{A}\middle\vert(T\psi)\right\rangle}={\left\langle \psi \middle\vert \hat{A}\middle\vert\psi\right\rangle}$.
In particular, we consider the projection operators with respect to the
elements of the basis,</p>
\[P_{\psi_j}={\left\vert \psi_j \right\rangle}{\left\langle \psi_j \right\vert}.\]
<p>Then we have that</p>
\[\begin{aligned}
{\left\langle (T\psi_i) \middle\vert P_{\psi_j}\middle\vert(T\psi_i)\right\rangle} &= {\left\langle (T\psi_i) \middle\vert \psi_j\right\rangle}{\left\langle \psi_j \middle\vert T\psi_i\right\rangle}\\
&= \left\vert{\left\langle \psi_j \middle\vert T\psi_i\right\rangle} \right\vert^2.
\end{aligned}\]
<p>On the other hand,</p>
\[\begin{aligned}
{\left\langle (T\psi_i) \middle\vert P_{\psi_j}\middle\vert(T\psi_i)\right\rangle} &= {\left\langle \psi_i \middle\vert P_{\psi_j}\middle\vert\psi_i\right\rangle}\\
&= {\left\langle \psi_i \middle\vert \psi_j\right\rangle}{\left\langle \psi_j \middle\vert \psi_i\right\rangle}\\
&= \left\vert {\left\langle \psi_i \middle\vert \psi_j\right\rangle} \right\vert^2\\
&= \delta_{ij}.
\end{aligned}\]
<p>This means that</p>
\[\left\vert{\left\langle \psi_j \middle\vert T\psi_i\right\rangle} \right\vert^2 = \delta_{ij},\]
<p>so that, say,
${\left\langle \psi_j \middle\vert T\psi_j\right\rangle}=e^{i\theta_j}$.
From here we obtain that</p>
\[T{\left\vert \psi_j \right\rangle}= \sum_k {\left\langle \psi_k \middle\vert T\psi_j\right\rangle}{\left\vert \psi_k \right\rangle} = e^{i\theta_j}{\left\vert \psi_j \right\rangle}.\]
<p>Now we don’t know that $T$ is linear or antilinear, but it certainly
does preserve rays. For any given state
${\left\vert \varphi \right\rangle}$ we repeat the previous process and
obtain that there exists a unitary scalar $\alpha_\varphi$ such that
$T{\left\vert \varphi \right\rangle}=\alpha_\varphi{\left\vert \varphi \right\rangle}$.
Then if
${\left\vert \psi \right\rangle}=\beta{\left\vert \phi \right\rangle}$,
we have that</p>
\[\begin{aligned}
T{\left\vert \psi \right\rangle} = \alpha_{\psi}{\left\vert \psi \right\rangle} =
\alpha_{\psi}\beta{\left\vert \phi \right\rangle} =
\alpha_{\psi}\beta\alpha_{\phi}^{-1}T{\left\vert \phi \right\rangle}.
\end{aligned}\]
<p>Then $T$ maps rays into rays, and it preserves the
norms of inner products,</p>
\[\vert{\left\langle (T\psi) \middle\vert (T\phi)\right\rangle}\vert = \vert\alpha_{\psi}^*\alpha_{\phi}{\left\langle \psi \middle\vert \phi\right\rangle} \vert = \vert{\left\langle \psi \middle\vert \phi\right\rangle}\vert,\]
<p>and thus it satisfies the hypotheses of Wigner’s theorem, so $T$ must be
unitary or antiunitary. Suppose that it is unitary, so it preserves
inner products. Then for any
${\left\vert \psi \right\rangle},{\left\vert \phi \right\rangle}$, it
follows that</p>
\[{\left\langle \psi \middle\vert \phi\right\rangle}={\left\langle (T\psi) \middle\vert T\phi\right\rangle}=\alpha_{\psi}^*\alpha_{\phi}{\left\langle \psi \middle\vert \phi\right\rangle}.\]
<p>Then $\alpha_{\psi}^*\alpha_{\phi}=1$. However, since
$\vert\alpha_{\psi}\vert=1$, then $\alpha_{\psi}^{-1}=\alpha_{\psi}^*$,
and this implies that $\alpha_{\psi}^{-1}\alpha_{\phi}=1$, or rather,
$\alpha_{\psi}=\alpha_{\phi}$. In conclusion, if $T$ is a unitary
absolute symmetry, then is is multiplication by a unit scalar:</p>
\[T{\left\vert \psi \right\rangle}=\alpha{\left\vert \psi \right\rangle}\quad \forall {\left\vert \psi \right\rangle}.\]
<p>And we <em>know</em> that this is a redundancy in our description of the
states! In truth, we should work in the <em>projective</em> Hilbert space
$\mathbb{P}\mathcal{H}$, in which we assume that vectors which are related by
multiplication by any unitary scalar represent <em>one and the same state</em>.</p>
<h1 id="describing-multiparticle-states">Describing multiparticle states</h1>
<p>Suppose that we have two systems $S_1,S_2$, described by Hilbert spaces
$\mathcal{H}_1,\mathcal{H}_2$, respectively. The mathematical setting
for describing the joint system is the [completion] $\mathcal{H}$ of
the <strong>tensor product</strong><sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup> $\mathcal{H}_1\otimes \mathcal{H}_2$. The
inner product in $\mathcal{H}$ is term-wise as</p>
\[{\left\langle u_1\otimes u_2 \middle\vert v_1\otimes v_2\right\rangle}:= {\left\langle u_1 \middle\vert v_1\right\rangle}{\left\langle u_2 \middle\vert v_2\right\rangle}.\]
<p>There are a few different notations for product elements of
$\mathcal{H}$, but we will mostly stick to
$\left\vert u_1 u_2 \right\rangle:=\left\vert u_1 \right\rangle\otimes\left\vert u_2 \right\rangle$.
Note that not every state (actually <em>very few</em> states) in $\mathcal{H}$
can be written as a tensor product of two states in $\mathcal{H}_1$ and
$\mathcal{H}_2$. A general state is a <em>linear</em> combination of states of
the form $\left\vert u_1,u_2 \right\rangle$, with
$u_i\in \mathcal{H}_i$, and if
$\{\left\vert \varphi_i \right\rangle\}$, $\{\left\vert \psi_j \right\rangle\}$
are bases for $\mathcal{H}_1$ and $\mathcal{H}_2$, then
$\{\left\vert \varphi_i \psi_j \right\rangle\}$ is a
basis for $\mathcal{H}$.</p>
<p>Any observable $\hat{A}$ over the space $ \mathcal{H}_1 $ can be
“extended” to an observable over $\mathcal{H}$, by identifying it with
$\hat{A}\otimes \text{id}_{\mathcal{H}_2}$. Similarly for observables
over $\mathcal{H}_2$.</p>
<p>I will not go into all the nuances and interesting details of these kind
of states (e.g. entangled states and such), since what we are interested
in is representing indistinguishability.</p>
<h1 id="indistinguishability-defined">Indistinguishability defined</h1>
<p>For the time-being, we will only work with $2$-particle spaces, and then
we will go nuts and go to general $k$-particle spaces. Consider the case
where $\mathcal{H}_1=\mathcal{H}_2:=\mathcal{H}$, in which case the
total Hilbert space is $\mathcal{H}\otimes\mathcal{H}$. By definition,
the states ${\left\vert \psi_1\psi_2 \right\rangle}$ and
${\left\vert \psi_2\psi_1 \right\rangle}$ are considered <em>different</em>.
However, we want them to <em>represent the same state</em>. There are two ways
to go about this:</p>
<ol>
<li>
<p>Scrap everything and rethink a suitable Hilbert space that clearly
represents indistinguishable particles without any redundancy, or</p>
</li>
<li>
<p>force any two states that represent the same configuration (e.g. as
above) to be <em>absolutely symmetric</em>.</p>
</li>
</ol>
<p>If you’re all the way down here, you’ll be glad to read that we will not
choose, I repeat, not choose option 1. Even more, working with option 2
will actually <em>suggest</em> which Hilbert space to use that would work for
option 1. It’s a win-win.</p>
<p>Then we choose option 2. Define a transformation $T$<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup> that acts on
product terms as</p>
\[T{\left\vert \psi_1\psi_2 \right\rangle}={\left\vert \psi_2\psi_1 \right\rangle},\]
<p>and extend it by linearity. This is a <em>permutation</em> transformation.
Since we require $T$ to be a (unitary) absolute symmetry, then by the
result of the previous section $T$ must behave as multiplication by some
unit scalar $\alpha$, that is, $T=\alpha
I$. However, we also have that permuting twice should do nothing (if you
switch $1$ and $2$, then switch them again, you get $1$ and $2$), so
$T^2=I$. But we also have that $T^2=\alpha^2I$, so this implies that
$\alpha^2=1$. Then $\alpha=\pm 1$, that is, $T=\pm I$.</p>
<p>But wait!</p>
<p>This cannot possibly be true for <em>all</em> elements of
$\mathcal{H}\otimes\mathcal{H}$! As a counterexample, consider
<em>literally any pair of linearly independent elements</em>
$\left\vert \psi_1 \right\rangle,\left\vert \psi_2 \right\rangle\in \mathcal{H}$.
<strong>By definition</strong>, the tensor products
$\left\vert \psi_1\psi_2 \right\rangle$ and
$\left\vert \psi_2\psi_1 \right\rangle$ are not only <em>different</em>, but
also <em>linearly independent</em>. This means that</p>
\[T\left\vert \psi_1\psi_2 \right\rangle = \left\vert \psi_2\psi_1 \right\rangle\overset{!!!}{\neq}\pm \left\vert \psi_1\psi_2 \right\rangle.\]
<p>So the relation $T=\pm I$ cannot be true for <em>all</em> vectors! Did we
arrive at a contradiction? Is it <em>impossible</em> to represent the state
space of two identical particles with permutation symmetry? Well… no.
We just showed a counterexample where
$T{\left\vert \psi_1\psi_2 \right\rangle}\neq \pm{\left\vert \psi_1\psi_2 \right\rangle}$,
but there <em>are</em> some vectors in $\mathcal{H}\otimes\mathcal{H}$
where permutation <em>is</em> a symmetry. For example, consider the vector</p>
\[\left\vert \psi \right\rangle = \frac{1}{\sqrt{2}}\left(\left\vert \psi_1\psi_2 \right\rangle+\left\vert \psi_2\psi_1 \right\rangle\right).\]
<p>If we apply $T$, then we get</p>
\[\begin{aligned}
T\left\vert \psi \right\rangle &= \frac{1}{\sqrt{2}}\left(T\left\vert \psi_1\psi_2 \right\rangle+T\left\vert \psi_2\psi_1 \right\rangle\right)\\
&= \frac{1}{\sqrt{2}}\left(\left\vert \psi_2\psi_1 \right\rangle+\left\vert \psi_1\psi_2 \right\rangle\right)\\
&= \frac{1}{\sqrt{2}}\left(\left\vert \psi_1\psi_2 \right\rangle+\left\vert \psi_2\psi_1 \right\rangle\right)\\
&= \left\vert \psi \right\rangle.
\end{aligned}\]
<p>Similarly, consider the vector</p>
\[\left\vert \psi' \right\rangle = \frac{1}{\sqrt{2}}\left(\left\vert \psi_1\psi_2 \right\rangle-\left\vert \psi_2\psi_1 \right\rangle\right).\]
<p>Upon acting with $T$,</p>
\[\begin{aligned}
T\left\vert \psi' \right\rangle &= \frac{1}{\sqrt{2}}\left(T\left\vert \psi_1\psi_2 \right\rangle-T\left\vert \psi_2\psi_1 \right\rangle\right)\\
&= \frac{1}{\sqrt{2}}\left(\left\vert \psi_2\psi_1 \right\rangle-\left\vert \psi_1\psi_2 \right\rangle\right)\\
&= -\frac{1}{\sqrt{2}}\left(\left\vert \psi_1\psi_2 \right\rangle-\left\vert \psi_2\psi_1 \right\rangle\right)\\
&= -\left\vert \psi' \right\rangle.
\end{aligned}\]
<p>Then for <em>some</em> states, it is true that
$T\left\vert \psi \right\rangle=\pm\left\vert \psi \right\rangle$.</p>
<p>What do we do now? We <em>restrict</em> our state space to one where there is a
permutation symmetry, and we simply <em>ignore</em> all the other vectors that
do not have this symmetry. So we define the <strong>symmetric product</strong> of
$\mathcal{H}$ as the set of all vectors
${\left\vert \psi \right\rangle}\in\mathcal{H}\otimes\mathcal{H}$
for which
$T{\left\vert \psi \right\rangle}={\left\vert \psi \right\rangle}$:</p>
\[\mathcal{S}^2\mathcal{H} := \left\{\left\vert \psi \right\rangle\in \mathcal{H}\otimes\mathcal{H}~:~T\left\vert \psi \right\rangle=\left\vert \psi \right\rangle\right\}.\]
<p>We call this symmetric product the <strong>boson state space</strong><sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup> (of two
particles). Similarly, we can also define the <strong>antisymmetric product</strong>
of $\mathcal{H}\otimes\mathcal{H}$ as</p>
\[\Lambda^2\mathcal{H} := \left\{\left\vert \psi \right\rangle\in \mathcal{H}\otimes\mathcal{H}~:~T\left\vert \psi \right\rangle=-\left\vert \psi \right\rangle\right\},\]
<p>and we call this the <strong>fermion state space</strong> in two particles. These
<em>fermionic</em> states have the particularity that two particles cannot be
in the same state! That is, the vectors
${\left\vert \psi\psi \right\rangle}$ (which represent two particles in
the same state) are <em>not</em> in the fermion state space. This is in stark
contrast with the bosonic case, where
${\left\vert \psi\psi \right\rangle}\in \mathcal{S}^2\mathcal{H}$.</p>
<p>The beauty of this is that we are <em>not</em> violating any of the axioms of
quantum mechanics by restricting ourselves to this space. The boson and
fermion state spaces <em>are</em> Hilbert spaces (one must simply check that
they are closed subspaces of $\mathcal{H}\otimes\mathcal{H}$), and
in fact, we can even find a nice, juicy basis for them, given that we
already have one for $\mathcal{H}$. That comes next week, pinky
promise.</p>
<h1 id="there-are-spoiler-alert-more-than-two-particles-in-the-universe">There are (spoiler alert) more than two particles in the universe</h1>
<p>This astonishing discovery means that we’re going to need a bigger boat.
What if we have $k$ particles? The procedure is almost exactly the same
as in the previous section, but we have to be a bit more careful. In
this case, the ambient space “with redundancies” will be the $k$-th
tensor product of $\mathcal{H}$, which we write as
$\otimes^k\mathcal{H}$.</p>
<p>But now permutations get <em>a lot</em> more complicated since we can exchange
any number of particles. In general a $k$-permutation is a rearrangement
of the numbers $\left\{1,\dots,k\right\}$, which is to say a
bijective<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup> function
$\sigma:\left\{1,\dots,k\right\}\to\left\{1,\dots,k\right\}$. The
set of all $k$-permutations is denoted $\mathfrak{S}_k$.</p>
<p>Our requirement of <em>symmetry under permutations</em> can be rewritten as
follows: For any permutation $\sigma\in \mathfrak{S}_k$, define a
transformation $T_\sigma$ on product vectors as</p>
\[T_\sigma\left\vert \psi_1\cdots\psi_k \right\rangle = \left\vert \psi_{\sigma(1)}\cdots\psi_{\sigma(k)} \right\rangle,\]
<p>and extend it to the entire space $\otimes^k\mathcal{H}$ by linearity.
As an example, let $k=5$, and consider the permutation
\(\sigma:(1,2,3,4,5)\mapsto (2,4,5,1,3).\) Then the action of $T_\sigma$
is</p>
\[T_{\sigma}\left\vert \psi_1\psi_2\psi_3\psi_4\psi_5 \right\rangle = \left\vert \psi_{\sigma(1)}\psi_{\sigma(2)}\psi_{\sigma(3)}\psi_{\sigma(4)}\psi_{\sigma(5)} \right\rangle = \left\vert \psi_2\psi_4\psi_5\psi_1\psi_3 \right\rangle\]
<p>(and extended by linearity). Note that $\psi_1,\dots,\psi_k$ do not have
to be different a priori for this to make sense.</p>
<p>Also note that $\sigma\mapsto T_\sigma$ is a linear action, or what we
call a <strong>representation</strong> of $\mathfrak{S}_k$ on
$\otimes^k\mathcal{H}$. This is because
$T_\sigma\circ T_{\sigma’}=T_{(\sigma\circ\sigma’)}$ for any pair of
permutations $\sigma,\sigma’\in \mathfrak{S}_k$.</p>
<p>Our requirement of <em>permutation invariance</em> is that $T_{\sigma}$ is an
<em>absolute (unitary) symmetry</em> <strong>for all permutations</strong>
$\sigma\in\mathfrak{S}_k$. This means that for each
$\sigma\in \mathfrak{S}_k$, there is a unit scalar $\alpha_\sigma$
such that $T_\sigma=\alpha_\sigma I$. In particular, let’s choose a
<em>transposition</em>, which simply switches two numbers,
$\tau:(1,\dots,i,\dots,j,\dots,k)\mapsto(1,\dots,j,\dots,i,\dots,k)$. We
write $(i~j)$ for the transposition that switches $i$ and $j$. Then
we’re back in known waters, since transpositions are their own inverses!
This means that $T_{(i~j)}^2 = I$, which, again, means that
$\alpha_{(i~j)}=\pm 1$, as in the two-particle case. However, this does
not mean that it is $1$ for <em>all</em> transpositions or $-1$ for <em>all</em>
transpositions; it might happen that $\alpha_{\tau}=1$ for some
transpositions but $\alpha_{\tau}=-1$ for others. Just because this is
getting too long, I will leave as an appendix<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup> the proof that,
indeed, it should be the same for all transpositions, i.e.
$\alpha_{(i~j)}=1$ for all transpositions or $\alpha_{(i~j)}=-1$ for all
transpositions.</p>
<p>In order to continue, we now need an important result of group theory,
which we will not prove, which states that every permutation can be
written as a (non-unique) product of transpositions. Even though this
product is not unique, what <em>is</em> unique is the <em>parity</em> of the number of
transpositions needed to express a permutation. That is, a permutation
that can be written as an <em>even</em> number of transpositions can <em>only</em> be
written as an even number of transpositions, and similarly, a
permutation that can be written as an <em>odd</em> number of transpositions can
<em>only</em> be written as an odd number of transpositions.</p>
<p>The <strong>sign</strong> of a permutation $\sigma$, denoted
$\operatorname{sgn}(\sigma)$, is defined as $1$ if it can be written
with an even number of transpositions, and $-1$ if it can be written
with an odd number of transpositions. With this definition, we are
nearly done. Suppose that $\alpha_{(i~j)}=1$ for all transpositions.
Then we have, for any permutation $\sigma$, that</p>
\[T_\sigma = T_{\tau_1\cdots\tau_m} = T_{\tau_1}\cdots T_{\tau_m} = 1\cdots 1 I = I,\]
<p>where $\tau_1,\dots\tau_m$ are transpositions that compose $\sigma$.
This is what we call the <strong>trivial representation</strong> of
$\mathfrak{S}_k$. In this case, we require the canonical action of
permutations to be the trivial representation:</p>
\[T_\sigma = I.\]
<p>However, we fall into the same trap as in the previous section. By
<em>definition</em>, this is simply not true for products of linearly
independent vectors, so we must restrict our scope to a subspace of
$\otimes^{k}\mathcal{H}$ where there <em>is</em> this kind of permutation
invariance. We define the <strong>$k$-th symmetric product</strong> of
$\mathcal{H}$ as</p>
\[\mathcal{S}^k\mathcal{H} := \left\{\left\vert \psi \right\rangle\in \otimes^k\mathcal{H}~:~T_\sigma \left\vert \psi \right\rangle = \left\vert \psi \right\rangle \text{ for all }\sigma\in \mathfrak{S}\_k\right\}.\]
<p>And we call it the <strong>boson state space</strong> of $k$ particles. An example of
such a state is</p>
\[\left\vert \psi \right\rangle = \frac{1}{\sqrt{3}}\left(\left\vert \psi_1\psi_2\psi_2 \right\rangle + \left\vert \psi_2\psi_1\psi_2 \right\rangle + \left\vert \psi_2\psi_2\psi_1 \right\rangle \right)\in\mathcal{H}\otimes\mathcal{H}\otimes\mathcal{H}.\]
<p>But now we have the other choice, which is that $T_{(i~j)}=-1$ for all
transpositions. In this case, the symmetry is not as simple. If $\sigma$
is a permutation, then we decompose it into transpositions,
$\sigma = \tau_1\cdots \tau_m$. The action of $T_\sigma$ is, then</p>
\[T_{\sigma} = T_{\tau_1\cdots \tau_m}=T_{\tau_1}\cdots T_{\tau_m} = (-1)^mI = \operatorname{sgn}(\sigma)I.\]
<p>This follows since $(-1)^m$ is precisely $1$ if $m$ is even and $-1$ if
$m$ is odd. Then our other choice for permutation symmetry is the
requirement that \(T_{\sigma} = \operatorname{sgn}(\sigma)I,\) which we
call the <strong>alternating representation</strong> of $\mathfrak{S}_k$. But
again! The same trap as before! This relationship does not hold for many
vectors in $\otimes^k\mathcal{H}$, so we restrict to a subspace where
this relationship does hold, which is the <strong>$k$-th alternating</strong> (or
exterior) <strong>product</strong> of $\mathcal{H}$:</p>
\[\Lambda^k\mathcal{H} := \left\{\left\vert \psi \right\rangle \in \otimes^k\mathcal{H}~:~T_\sigma \left\vert \psi \right\rangle =\operatorname{sgn}(\sigma) \left\vert \psi \right\rangle \text{ for all }\sigma\in \mathfrak{S}\_k\right\}.\]
<p>And this space is called the <strong>fermion state space</strong> of $k$ particles.
Again, it can be seen that there are no vectors in which there are two
particles in the same state (for if there were, a permutation of such
two particles would need to yield a $-1$… which it doesn’t). This
means that <em>two fermionic particles cannot be in the same state</em>,
something that is called <em>Pauli’s exclusion principle</em>.</p>
<p>These spaces will be the setting for doing quantum mechanics of many
particles. They seem pretty similar, but in reality, they are extremely
different.</p>
<p>Our objective for next week will be writing down an orthonormal basis
for these spaces, given a basis for $\mathcal{H}$. Then we will
consider states spaces where there can be arbitrarily many particles,
and hopefully find a way into field theory.</p>
<h1 id="the-takeaway">The takeaway</h1>
<p>The assumption of indistinguishability implies that the Hilbert space
that represents multiparticle states (particles of the same kind) must
not distinguish between “particle $a$ is in state $x$ and particle $b$
is in state $y$” from “particle $a$ is in state $y$ and particle $b$ is
in state $x$”. Coming up with such a Hilbert space from scratch is
not trivial, so what we do instead is construct a Hilbert space whose
elements represent multiparticle states, but also in which the particles
are distinguishable. If one particle is represented by the Hilbert space
$\mathcal{H}$, then such a “distinguishable” multiparticle space is
$\otimes^k\mathcal{H}$. Then, in order to pass to
indistinguishability, we consider <em>subspaces</em> of
$\otimes^k\mathcal{H}$ in which <em>permutation is an absolute symmetry</em>.
There are only two of those, namely the <em>bosonic space</em> which is the
symmetric product $\mathcal{S}^k\mathcal{H}$, and the <em>fermionic
space</em> which is the antisymmetric product $\Lambda^k\mathcal{H}$. In
particular, the fermionic space does <em>not</em> have any elements in which
two particles are in the same state. This is Pauli’s Exclusion
Principle.</p>
<h2 id="references">References</h2>
<ul>
<li>
<p>Folland, G. B. (2008). <em>Quantum Field Theory: A Tourist Guide for
Mathematicians</em>, Chapters 3 and 4.</p>
</li>
<li>
<p>Sakurai, J. J. (1995). <em>Modern Quantum Mechanics, Revised Edition</em>,
Chapters 1, 4, and 6.</p>
</li>
<li>
<p>Beck, M. (2012). <em>Quantum Mechanics: Theory and Experiment</em>,
Chapter 8.</p>
</li>
<li>
<p>Weinberg, S. (2002). <em>The Quantum Theory of Fields I: Foundations</em>,
Chapter 2, Appendix A.</p>
</li>
</ul>
<h2 id="appendix-t_sigma-is-the-same-for-all-sigma">Appendix: $T_\sigma$ is the same for all $\sigma$</h2>
<p>The quick way to do this, which requires a bit of machinery from
algebra, is as follows (a more down-to-earth proof, I think, will follow
soon). Note that the representation
$T:\mathfrak{S}_k\to \left\{I,-I\right\}\cong \mathbb{Z}_2$ is
actually a one-dimensional (since it maps to multiples of the identity),
and therefore $\operatorname{im}(T)$ is abelian. This means that $T$ must
factorize through the abelianization
$\mathfrak{S}_k^{ab}=\mathfrak{S}_k/[\mathfrak{S}_k,\mathfrak{S}_k]$,
where $[\mathfrak{S}_k,\mathfrak{S}_k]$ is generated by the
commutators $ghg^{-1}h^{-1}$. However, the map
$\operatorname{sgn}:\mathfrak{S}_k\to \left\{1,-1\right\}\cong \mathbb{Z}_2$
is surjective whose kernel is precisely<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>
$[\mathfrak{S}_k,\mathfrak{S}_k]$, so
$\mathfrak{S}_k^{ab}\cong \mathbb{Z}_2$. This means that
$T=\tilde{T}\circ \pi$, where $\tilde{T}:\mathbb{Z}_2\to\mathbb{Z}_2$ is
a homomorphism and $\pi=\mathrm{sgn}$ is the projection to the
abelianization. There are only two possible endomorphisms of
$\mathbb{Z}_2$, namely the trivial one and $\operatorname{id}$. So if
$\tilde{T}$ is trivial, then $T$ is trivial too. If
$\tilde{T}=\operatorname{id}$, then $T=\operatorname{sgn}$.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Maybe the hypothesis is justified in that otherwise the standard
definition of entropy would not be extensive (Gibbs’ paradox)… But
maybe that is a problem with the definition of entropy, not the
distinguishability (oof long word!) of particles. Again, I’m not
opening that can of worms. Yet. Stay tuned! <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Of course, there is the subtlety of the case where $\hat{A}$ has a
continuous spectrum without eigenstates, but this can be formally
extended as probability densities. I think. Don’t think about it too
much. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Don’t quote me on this, I just made this up. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>If this is not convincing, then ask yourself “how you I tell two
such states apart?” There is no observable that can distinguish
between them. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>The pragmatic justification for this is that this is the simplest
non-trivial way to make a Hilbert space out of $\mathcal{H}_1$ and
$\mathcal{H}_2$, but I am sure that this can be justified using the
spectra of observables of the joint system and a little bit of
probability theory. Stay tuned! <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>This notation is not standard. It is usually called $p$ or $P$ or
something like that or maybe something entirely different because
nobody wants confusion with the momentum operator. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>“Why “boson”? What does this have to do with spin? Aaaah!” Well,
again. <a href="https://youtu.be/aIKPqxLxXTY?t=3">A good question… for another
time.</a> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>Recall that this means that for each $i$ there is one <em>and only
one</em> number $j$ such that $\sigma(j)=i$. This implies that when you
write down $\sigma(1),\dots,\sigma(k)$, you obtain exactly
$1,\dots,k$ but in disorder. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>that is, a post in the future <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>One of the inclusions is trivial to prove, the other requires
noting that the product of two transpositions is a product of
$3$-cycles, and that $3$-cycles are always commutators. This is not
hard, but not obvious! <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Santiago Quintero de los RíosHow do we represent states with multiple particles? It is not trivial to construct Hilbert spaces from scratch that represent indistinguishable particles, so what we do is construct a space that represents distinguishable particles, and then force that permutation of particle 'tags' is a symmetry.Mini-post: The periodic 1D wave equation2018-09-26T00:00:00+02:002018-09-26T00:00:00+02:00http://homotopico.com/physics/2018/09/26/periodic-wave-equation<p>Get this post in <a href="/assets/docs/pdf_posts/periodic-wave-equation.pdf">pdf format here</a>.</p>
<p>O, many times have I seen the wave equation with periodic boundary
conditions:</p>
\[\frac{1}{v^2}{\frac{\partial ^2}{\partial t^2}}\varphi(t,x) - {\frac{\partial ^2}{\partial x^2}}\varphi(t,x) = 0,\]
<p>subject to $\varphi(t,0)=\varphi(t,L)$ for all $t$. Don’t we all just
<strong>know</strong> that the solutions are linear combinations of elements of the
form</p>
\[e^{\pm i(k_n x \pm \omega_n t)},\]
<p>with \(k_n = \frac{2\pi n}{L},\quad{n\in \mathbb{Z}},\) and
$\omega_n = |k_n||v|$?</p>
<p>Don’t we all <strong>knoooooooow</strong> this?</p>
<p>Well I forgot how to do this and I should be doing other things but here
it goes.</p>
<h2 id="what-we-all-did-sometime-in-the-remote-past">What we all did sometime in the remote past</h2>
<p>Separate variables (of course!) by writing $\varphi(t,x)=f(x)g(t)$. Then
we have that $\partial_t\varphi(t,x) = f(x)g’(t)$ and
$\partial_x\varphi(t,x)=f’(x)g(t)$. Substitute that in:</p>
\[\frac{1}{v^2}{\frac{\partial ^2}{\partial t^2}}\varphi(t,x) - {\frac{\partial ^2}{\partial x^2}}\varphi(t,x) = \frac{1}{v^2}f(x)g''(t) - f''(x)g(t) .\]
<p>Assume that $f(x)g(t)\neq 0$ and divide the right-hand side by
$f(x)g(t)$. Then we have</p>
\[\frac{1}{v^2}\frac{1}{g(t)}g''(t) = \frac{1}{f(x)}f''(x).\]
<p>Now fix
some value of $t$, say $t=0$. This equation implies that for all $x$</p>
\[\frac{1}{f(x)}f''(x) = \frac{1}{v^2}\frac{1}{g(0)}g''(0) = \alpha,\]
<p>where we have <strong>defined</strong> $\alpha$ as the right-hand side. It’s clearly
a constant. Similarly, if we fix an $x$, say $x=0$, we have that for
<strong>all</strong> $t$, the following equation holds:</p>
\[\frac{1}{v^2}\frac{1}{g(t)}g''(t) = \frac{1}{f(0)}f''(0)=\alpha,\]
<p>where the rightmost equality follows from the previous equation (which
holds for all $x$, in particular $x=0$). Then <strong>both</strong> terms are equal
to $\alpha$, a constant to be determined.</p>
<p>Let’s try to determine that. Let’s work on the equation for $f$. We have
that</p>
\[f''(x) = \alpha f(x),\]
<p>which we recognize as a simple
second-order homogeneous linear differential equation. The solutions to
this equation are of the form</p>
\[f(x) = C_1e^{\mu x} + C_2e^{-\mu x},\]
<p>where $C_1,C_2$ are constants to be determined and $\mu^2=\alpha$. Note
that $\mu$ might be complex, depending on whether $\alpha$ is positive
or negative (or complex too!). Now the periodic boundary conditions
imply $f(0)=f(L)$, so</p>
\[f(0) = C_1+C_2 = C_1e^{\mu L } + C_2 e^{-\mu L} = f(L).\]
<p>Save that for later. We also have that $f’(0) = f’(L)$, so</p>
\[f'(0) = \mu C_1 - \mu C_2 = \mu C_1 e^{\mu L } - \mu C_2 e^{-\mu L}.\]
<p>This implies that, assuming that $\mu\neq 0$,</p>
\[C_1 - C_2 = C_1e^{\mu L} - C_2e^{-\mu L}.\]
<p>Adding the conditions for $f(0)=f(L)$ and $f’(0)=f’(L)$ we obtain that \(C_1 = C_1 e^{\mu L},\)</p>
<p>which implies that either $C_1=0$ or $e^{\mu L}=1$. If the first case is
true then to avoid the trivial solution, we have to require $C_2\neq 0$
which implies $e^{-\mu L}=1$. Either way, this is unavoidable, and it
implies that $\mu L = 2\pi n i$ for some $n\in \mathbb{Z}$.</p>
<p>Therefore we can write</p>
\[k_n = \frac{2\pi n}{L},\quad n\in \mathbb{Z},\]
<p>so that $\mu = i k_n$ and the general solution to $f$ is</p>
\[f(x) = C_1e^{ik_n x} + C_2 e^{-i k_n x}.\]
<p>Nearly done. Now we work with the equation for $g$:</p>
\[g''(t) = v^2\alpha g(t).\]
<p>However, $\alpha = \mu^2 = (ik_n)^2=-k_n^2$, so the general solution is</p>
\[g(t) = K_1e^{ik_n|v|t}+K_2e^{-i k_n |v|t}.\]
<p>Here the constants $K_1,K_2$ are left unknown. Now we multiply $g(t)$ by $f(x)$:</p>
\[\varphi(t,x) = f(x)g(t) = C_1K_1e^{i(k_n x + k_n|v|t)}+C_1K_2e^{i(k_n x - k_n|v|t)} + C_2K_1 e^{i(-k_n x + k_n|v|t)} + C_2K_2e^{i(-k_n x - k_n|v|t)}.\]
<p>Now let $\omega_n = |k_n||v|$. Then the solution is a linear combination
of elements of the form</p>
\[e^{\pm i (k_n x \pm \omega_n t)}.\]Santiago Quintero de los RíosWherein I very briefly discuss the general solution to the periodic 1D wave equation.