3.2. Properties of Integration#
That last section was pretty neat stuff, eh?
While this might seem like all we did was introduced a bunch of cumbersome notation to “reinvent the wheel” and give you back your Riemann integral from calculus, it turns out this is a lot more powerful than that stuff. In particular, you’ll notice that we were extremely cautious every step of the way from the preceding chapters all the way to now regarding which assumptions different conditions hold under. These might seem like “mindless details” that you’d rather go without, but the brilliance of probability theory is the details.
These properties that we are constructing, from the ground up, have been explicit about every assumption made along the way. As we build more and more results, we’re going to keep that trend up, and all these assumptions and conditions are going to start making sense as you begin to see that extremely weak conditions might add up to a result that is beautiful. As you’ll see in the next section, the concept of expected value in probability theory is understood as a special case of integration with respect to a probability measure, so while we’re going to unfortunately burden you with some more properties of integrals, it will start to tie back to probability theory more directly next.
3.2.1. Norms and Convexity#
In this section, we’ll learn some details about two concepts that we will later see are acutely related: norms and convex functions. When we attempt to classify random variables in the next chapter, we’ll use these two concepts to do so.
3.2.1.1. Convex Functions#
We’ll start off with a definition you’ve probably seen in some form before:
Definition 3.11 (Convex Function)
A function
Let’s think about what this means, intuitively. First: what the heck is a convex subset
The idea here is that in this case, the point

Fig. 3.2 (A) The red line represents the line
An important consequence that we will use throughout this course is a relatively simple real analysis result:
Lemma 3.5 (Convex functions and second derivatives)
A function
What this asserts is that if the function
Now, we’re going to talk about some nuancy points that are consequences of the definition of convex functions. In my opinion, the proofs/intuition of these results go somewhat beyond an introductory real analysis course, so you shouldn’t worry if it doesn’t immediately make sense to you:
Lemma 3.6 (Subderivatives of convex functions)
Suppose that the function
That
are both finite, and

Fig. 3.3 In this case, we can see two sub-tangent lines at the point
Now we get to one of the fundamental results of integration:
Theorem 3.2 (Jensen’s Inequality)
Suppose that:
is a probability space, is an interval, is a measurable function, is convex, and and are -integrable.
Then:
This result, it turns out, is pretty easy to prove if
Proof. Let
Let
Letting
Note, then, that
It is pretty that
Then by Corollary 3.1, since
where by construction,
So: why did we have to use sub-derivatives, and what did they let us do? Well, since
3.2.1.2. Norms#
Jensen’s inequality will make properties about norms of random variables very easy to prove. What’s a norm you might ask?
Definition 3.12 (Functional norm)
Suppose that
We tend to classify functions as those that have finite functional norms:
Definition 3.13 (
Suppose that
We can use functional norms to obtain some more desirable properties of integration. Let’s check out Hölder’s inequality:
Theorem 3.3 (Hölder’s inequality)
Suppose that
Proof. If
In this case, the product
Therefore, suppose that
Note that for any
Notice that
Taking
which holds for all
3.2.2. Convergence Results#
3.2.2.1. Convergence Concepts#
Next, we get to the convergence theorems for integrals. To do this, we first need two quick definitions to get us started:
Definition 3.14 (Convergence Almost Everywhere)
Suppose the measure space
The idea here is that for all but a set of measure
We can also understand this definition using the concept of the
Definition 3.15 (Equivalent definition for convergence almost everywhere)
Suppose the measure space
In this definition, the intuition is that we are focusing on a sequence of sets (indexed by
Remember that
So, the interpretation of
By construction, since
Next, we get to a practically distinct definition, which almost looks the same. This concept is called convergence in measure:
Definition 3.16 (Convergence in measure)
Suppose the measure space
While these definitions almost look the same, the practical distinction is that the limit, this time, is outside of the measure statement. The idea here is that, as
Lemma 3.7 (Convergence almost everywhere implies convergence in measure)
Suppose the measure space
Proof. Suppose that
Let
By definition of
By construction, note that
Further, note that by design, for any
Then for all
Then
Then by definition of a measure,
Then since
by the convergence from above property of measures, Property 2.6.
We aren’t quite ready to handle the result that
Let’s see what these two concepts will allow us to do.
3.2.2.2. Convergence Theorems#
Theorem 3.4 (Bounded Convergence)
Suppose the measure space
is a -finite set, where , is a sequence of measurable functions which vanish on ; that is, ,There exists
s.t. ( are each bounded), and in measure.
Then:
The idea here is that we are conceptually moving the limit across the integral: the left hand side can be thought of as the integral of
Proof. Suppose that
Then by Theorem 3.1:
For the left-hand expression, we used that for
Continuing:
since
So, intuitively, as the functions
Lemma 3.8 (Fatou)
Suppose the measure space
Proof. Define
Note that as
Then:
Since
Let
Then:
Note that both sides are at most upper-bounded by
Taking the sup over
Notice that by definition of convergence from below, that by Lemma 3.2 applies in the bottom line, and we are finished.
This is clearly much more general than the Bounded Convergence Theorem: the only restriction we have here is that we have a sequence of non-negative functions; we don’t need a sequence of functions which is bounded and converging in measure.
When the functions are converging below, we can further clarify the nature of the convergence with a slight extension of Fatou’s Lemma: the integrals will also converge from below. In other words, functions converging “monotonely” have “monotonely” converging integrals:
Theorem 3.5 (Monotone Convergence (MCT))
Suppose the measure space
as
As you notice, this theorem statement looks a lot like the statement from Fatou’s Lemma Lemma 3.8, and in fact, we’ll use some of the intuition from Fatou’s Lemma to make this proof rigorous:
Proof. By Fatou’s Lemma Lemma 3.8,
since
Conversely, as
Together, this gives that:
Which is because we have that
Finally, note that
for all
Then
Next, we’ll see another application of Fatou’s Lemma, which is called the Dominated Convergence Theorem. Basically, what this theorem asserts is that if a sequence of measurable functions
Theorem 3.6 (Dominated Convergence (DCT))
Suppose the measure space
, is a function where , and is -integrable, for all .
Then:
The idea here is that
Proof. Note that since
By subtracting
which follows by Property 3.12.
Applying the same approach to
Since
3.2.3. Measure Restriction#
The final building block we will need in integration is the concept of measure restrictions. As its name somewhat suggests, a measure restriction basically lets us take an existing measure space
If you recall, we built machinery that worked on
Let’s give this idea a go:
Corollary 3.5 (Defining a measure by restriction)
Suppose that
is a set of disjoint events, , and is a -integrable function.
Then:
Further, if
Proof. Define
, for all by construction since , and , since by supposition, is -integrable.
Then by the Dominated Convergence Theorem 3.6:
The second to last result follows by noting that
Then by construction,
If further
We can repeat this argument for any
To see that
1. Contains
2. Closed under complements: Suppose that
Define
Notice that as
Since
3. Closed under countable unions: Suppose that
Then
Since
Then