3.4. Integration in higher dimensions#
So, let’s reflect back briefly to the journey we’ve taken so far.
3.4.1. The path we’ve taken#
3.4.1.1. The basics of measure theory#
We started way back in Section 2.1 by introducing some basics about sets and families of sets. We learned that we could have families of sets (sets of subsets) where we could equip the sets with a notion of measure, that for all intents and purposes, felt like it satisfied the intuitive concept of “size”.
Upon these so-called measure spaces, we could construct functions, called random variables, in which there was a notion of one-to-oneness of the inverse mapping with sets of the respective \(\sigma\)-algebras (measurable functions).
If we confined ourselves to probability spaces, we could qualify that these pushforward measures were qualified as a law of a random variable, which was a probability measure induced on the codomain. The law of the random variable could further be uniquely determined by the distribution function of the random variable.
3.4.1.2. The basics of integration#
Next, we built out all of the machinery that we needed to devise integrals on measure spaces in which the codomain are real numbers.
We discussed how we could build up from simple functions, to bounded functions, to non-negative, to integrable functions, with each integral, in effect, being well-defined using a simpler function and the behaviors of sets of simpler functions which in the limit would be our desired function of interest.
Finally, we saw how with the particularity of considering only probability spaces, that these integrals could be further characterized as expectations, and we obtained even more properties about integrals with respect to these particular spaces.
So, what did we miss?
3.4.1.3. Pushforward measures allow us to extend integration beyong real-valued codomains#
You’ll notice that, in the reflection we described above, that there was an important clarification: in all of the integration material we’ve covered so far, the codomain is strictly real numbers. This means that for a given event space \(\Omega\), the only random variables \(X\) that we know how to integrate with (thus far) are those where \(X(\omega)\) takes real values.
How big do you think a brain imaging measurement is? How complicated do you think a genome might be? There are tons of types of data that, for modern scientific practices, really cannot just be summarized with a real value. This means that as we are building out our machinery, we left out one of the biggest pieces of all: what do we do when we have more than a single real-valued number? How can we proceed?
As it so happens, a concept we already discussed, pushforward meaures, tied together with a new concept which we’ll get to in a second, are all that we need to integrate on much more complicated spaces. At a high level, what we’re going to do is a similar approach to what you saw back when we first defined the Lebesgue measure: we’re going to define a set of sets which are on \(\mathbb R^d\) (the \(d\)-dimensional real numbers), we’re going to show that this set of sets fulfills properties which put it into a pair of families of sets (\(\pi\)-systems and a new family of sets, \(\lambda\)-systems, and then we’re going to show that we can define a measure using an extension from these sets of sets to our space of interest. This is very similar to how we defined the Lebesgue measure using Carothéodory’s Extension Theorem 2.2.
Then, we’ll show how we can build out integration on this new measure space, using Fubini’s theorem, to extend what we already learned about integration on the real numbers to the new space. Finally, we’ll show how we can use pushforward measures and the change of variables formula to actually compute these integral expressions we came up with over the course of this chapter.
3.4.2. Some new families of sets#
So far, we’ve seen \(\pi\)-systems, algebras, and \(\sigma\)-algebras, and seen how these sets are progressively more restrictive definitions (all \(\sigma\)-algebras are algebras, but not all algebras were \(\sigma\)-algebras, so on and so forth). If you recall, for a \(\pi\)-system, for instance, you learned that it was a non-empty collection closed under finite intersections.
A \(\sigma\)-algebra, on the other hand, contained the whole set, was closed under complements, and was closed under countable unions. We learned that \(\pi\)-systems were relaxations of \(\sigma\)-algebras, in that all \(\sigma\)-algebras were \(\pi\)-systems, but not all \(\pi\)-systems were \(\sigma\)-algebras.
So, what was the key that we were missing in \(\pi\)-systems make them not quite \(\sigma\)-algebras? From the definition, it looks like just about everything, but we can be a bit more specific. We can differentiate \(\pi\)-systems from \(\sigma\)-algebras succinctly: the \(\pi\)-systems that are \(\sigma\)-algebras are also \(\lambda\)-systems, which is a new family of sets:
\(\lambda\)-system)
(Suppose that \(\Omega\) is a set. \(\mathcal L\) is called a \(\lambda\)-system on \(\Omega\) if it is a non-empty collection of subsets \(L \in \mathcal L\), s.t. \(L \subseteq \Omega\), where:
Contains the entire set: \(\Omega \in \mathcal L\),
Closure under contained set differences: If \(L_1, L_2 \in \mathcal L\) and \(L_1 \subseteq L_2\), then \(L_2 \setminus L_1 \in \mathcal L\), and
Closed under countable increasing unions: If \(\{L_n\}_{n \in \mathbb N} \subseteq \mathcal L\) is s.t. \(A_n \uparrow A = \bigcup_{n \in \mathbb N} A_n\), then \(A \in \mathcal L\).
Remember that the notation \(L_n \uparrow L\) means that \(\{L_n\}_n\) is converging to \(L\) from below, as-per Definition 2.30. This looks really close to the definition of a \(\sigma\)-algebra, but there is a key difference: the \(\lambda\)-system is only closed under countable increasing unions. This property can be restated another way to look even closer to the definition of a \(\sigma\)-algebra:
\(\lambda\)-systems are closed under countable disjoint unions)
(Suppose that \(\Omega\) is a set, and \(\mathcal L\) is a \(\lambda\)-system. Show that \(\mathcal L\) is closed under countable unions of pairwise disjoint sets; e.g., if \(\{L_n\}_{n \in \mathbb N} \subseteq \mathcal L\), where for all \(i \neq j\), \(L_i \cap L_j = \varnothing\), then \(\bigcup_{n \in \mathbb N}L_n \in \mathcal L\).
So we have closure under countable disjoint unions, whereas \(\sigma\)-algebras were closed under all countable unions.
We can also use this to obtain another fundamental property of \(\lambda\)-systems, that will be useful to us below:
\(\lambda\)-systems are closed under complements)
(Suppose that \(\Omega\) is a set, and \(\mathcal L\) is a \(\lambda\)-system. Then if \(L \in \mathcal L\), \(L^c \in \mathcal L\).
As it turns out, \(\sigma\)-algebras can equivalently be characterized as systems which are both \(\pi\) and \(\lambda\) systems:
\(\sigma\)-algebras and \((\pi, \lambda)\) systems)
(Equivalence ofSuppose that \(\Omega\) is a set. Then \(\mathcal F\) is a \(\sigma\)-algebra on \(\Omega\) if and only if \(\mathcal F\) is both a \(\pi\) and a \(\lambda\) system.
Proof. 1. \(\Rightarrow)\) Suppose that \(\mathcal F\) is a \(\sigma\)-algebra.
Notice that by construction, \(\mathcal F\) is contains the whole set, is closed under complements, and is closed under countable unions. Then \(\mathcal F\) is clearly a \(\lambda\)-system, because closure under countable increasing unions is a special case of closure under countable unions.
Further, \(\mathcal F\) is clearly a \(\pi\)-system, because it is closed under countable intersections by Property 2.1, and hence also closed under finitely many intersections.
\(\Leftarrow)\): Suppose that \(\mathcal F\) is both a \(\pi\)-system and a \(\lambda\)-system. We have three properties to prove:
Contains \(\Omega\): note that the definition of a \(\lambda\)-system ensures that \(\Omega \in \mathcal F\).
Closure under complements: note that the definition of a \(\lambda\)-system ensures that \(\mathcal F\) is closed under complements.
Closure under countable unions: Suppose that \(F_n \in \mathcal F\), for all \(n \in \mathbb N\). Define the sequence of sets:
Note that for \(n > 1\), \(E_n\) can be expressed as:
Notice that \(\mathcal F\) is closed under complements, since it is a \(\lambda\)-system, so \(F_m^c \in \mathcal F\). \(\mathcal F\) is closed under finite intersections since it is a \(\pi\)-system, so \(\bigcap_{m = 1}^{n - 1}F_m^c \in \mathcal F\). \(\mathcal F\) is a \(\lambda\)-system, so it is closed under set differences, so \(E_n \in \mathcal F\).
Finally, notice that by construction, \(\bigcup_{n \in \mathbb N}E_n \in \mathcal F\), since \(E_n\) are disjoint by construction.
Then noting that \(\bigcup_{n \in \mathbb N}E_n = \bigcup_{n \in \mathbb N}F_n\), it is clear that \(\bigcup_{n \in \mathbb N}F_n \in \mathcal F\).
This characterization can be used to prove a special result that will help us extend measures to more complicated spaces:
Proof. Dynkin \(\pi, \lambda\) Theorem Suppose that \(\Omega\) is a set, and let \(\mathcal P\) and \(\mathcal L\) be a \(\pi\)-system and a \(\lambda\)-system on \(\Omega\), respectively. Then if \(\mathcal P \subeteq \mathcal L\), \(\sigma(\mathcal P) \subseteq \mathcal L\).
3.4.3. Change of variables#
Just because we showed that you can calculate all the integrals we have discussed thus far, doesn’t mean we want to. The reason that we have introduced all this new notation, which might feel cumbersome at first, is that it will ultimately make your understanding of random variables and concepts deriving from all the things we’ve discussed thus far a lot easier to wrap your head around. Plainly, the expressions just happen to be a lot simpler, and the rules easier to follow.
However, when you actually need to arrive at a numerical solution to an integral, it’s typically a lot easier to use the rules of Riemann integration that you’ve learned through Calculus than it is to use rules like Theorem 3.1, Definition 3.8, or Definition 3.10, whenever possible. So, ideally, we will be able to benefit from the conceptual niceties of the measure-theoretic integration intuition that we’ve developed, but still be able to calculate these things numerically in practice. This is done via the change of variables, where we change the sets we are integrating over from one set to another, using a measurable function:
(Change of variables)
Suppose that \((\Omega, \mathcal F, \mu)\) is a measure space, and that \((S, \Sigma)\) and \((\mathbb R, \mathcal R)\) are measurable spaces, wnere \(X \in m(\mathcal F, \Sigma)\) and \(f \in m(\Sigma, \mathcal R)\) are random variables, and either:
\(f \geq 0\), or
\(f(X)\) is \(\mu\)-integrable.
Then with \(X_*\mu = \mu \circ X^{-1} : S \rightarrow \bar{\mathbb R}_{\geq 0}\), and \(X(\Omega) \triangleq \{X(\omega) : \omega \in \Omega\}\), the pushforward measure of \(\mu\) (Definition 2.40):
So, what does this formula let us do? What this formula allows is that we can take a measurable function like \(X\), we can compute functions of this measurable function \(f(X)\), and then can compute integrals of \(f(X)\) using not the space upon which \(X\) was defined, but on the space upon which \(f\) is defined. If the codomain, for instance, is a Riemann space (e.g., \(S = \mathbb R^d\)), then we have it really nice: we can use all of the rules we learned in Riemann calculus (the traditional approach to single and multivariable introductory calculus courses). This is called a change of variables because we are changing from integrating over \(\Omega\) to \(X(\Omega)\) using the pushforward measure \(X_*\mu\).
For now, let’s work on proving this result. We’ll do it in \(5\) steps, using a similar approach to what we did when we first learned about integration in Section 3.1, where we build up the equality from indicators, to simple functions, to bounded functions, to non-negative functions, to \(\mu\)-integrable functions. Throughout the below proof, we will repeatedly use the fact that the pushforward measure is a measure, Lemma 2.7. This means that measurable functions \(f \in m(\Sigma, \mathcal R)\) will be equipped with all of the niceties that we learned about in the preceding chapter Section 3.1 on integration:
Proof. 1. Indicator functions: Suppose that \(B \in \Sigma\), and \(f \triangleq \mathbb 1_{\{B\}}(s)\). Then:
Where in the final line, we use that \(f \in m(\Sigma, \mathcal R)\), the definition of a pushforward measure, Definition 2.40, and the definition of the integral of an indicator with respect to a measure, Definition 3.2 (noting that an indicator is a simple function).
2. Simple functions: Suppose that \(\{B_n\}_{n \in [m]} \subseteq \Sigma\) for \(m \in \mathbb N\), and \(\{\alpha_n\}_{n \in [m]} \subseteq \mathbb R\), where \(\{B_n\}_{n \in [m]}\) are disjoint.
Define \(f(s) \triangleq \sum_{n \in [m]}\alpha_n\mathbb 1_{\{B_n\}}(s)\) to be a simple function. Then:
Which follows since the sum of simple functions is simple by Lemma 3.1. Continuing and using the result from 1.:
Which follows since by Lemma 3.1 and that the pushforward measure is a measure, Definition 2.40.
3. Non-negative functions: Suppose \(f \geq 0\), and define:
Intuitively, \(f_n(x)\) is \(f(x)\) “rounded off” to the nearest \(\frac{1}{2^n}\) for a given \(x\), and is further upper-bounded by \(n\).
Note that each \(f_n(x)\) is simple because with an upper bound of \(n\), there are finitely many ways to round off the value \(f_n(x)\) (taking values in \([0, n]\)) to the nearest \(\frac{1}{2^n}\) (there are \(2^n n\) possible ways to do this).
Further, note that \(f_n \uparrow f\) as \(n \rightarrow \infty\). By definition, \(f_n \geq 0\), and by construction, since \(f_n \uparrow f\) as \(n \rightarrow \infty\), then \(f_n(X) \uparrow f(X)\) as \(n \rightarrow \infty\) absolutely (and certainly also \(a.e.\)). Then by the MCT Theorem 3.5:
4. Integrable functions: Write \(f(x) = f^+(x) - f^-(x)\), where \(f^+(x) = f(x) \wedge 0\), and \(f^-(x) = (-f(x)) \wedge 0\), as-per Definition 3.10.
Then since \(f(X)\) is \(\mu\)-integrable, both \(f^+(X)\) and \(f^-(X)\) are \(\mu\)-integrable, by Equation (3.4).
Direct application of Definition 3.10 gives:
Note further that \(f^+, f^- \geq 0\). Then by 3.:
Where in the final line, we used that \(f\) is a measurable function with domain \((S, \Sigma)\) and codomain \((\mathbb R, \mathcal R)\), Definition 3.10, and the definition of a pushforward measure Definition 2.40.
As an important remark, notice that we didn’t explicitly show the right-most equality, but this turns out to be pretty easy:
Note that in the change of variables formula, the following two are equivalent, which we did not explicitly prove above:
This is because for any \(r \not \in X(\Omega)\), \(X^{-1}: r \mapsto \varnothing\). Noting that \(X_*\mu\) is a measure, and hence the measure of the empty set is \(0\), we can just as easily write:
Therefore: