3.2. Properties of Integration#

That last section was pretty neat stuff, eh?

While this might seem like all we did was introduced a bunch of cumbersome notation to “reinvent the wheel” and give you back your Riemann integral from calculus, it turns out this is a lot more powerful than that stuff. In particular, you’ll notice that we were extremely cautious every step of the way from the preceding chapters all the way to now regarding which assumptions different conditions hold under. These might seem like “mindless details” that you’d rather go without, but the brilliance of probability theory is the details.

These properties that we are constructing, from the ground up, have been explicit about every assumption made along the way. As we build more and more results, we’re going to keep that trend up, and all these assumptions and conditions are going to start making sense as you begin to see that extremely weak conditions might add up to a result that is beautiful. As you’ll see in the next section, the concept of expected value in probability theory is understood as a special case of integration with respect to a probability measure, so while we’re going to unfortunately burden you with some more properties of integrals, it will start to tie back to probability theory more directly next.

3.2.1. Norms and Convexity#

In this section, we’ll learn some details about two concepts that we will later see are acutely related: norms and convex functions. When we attempt to classify random variables in the next chapter, we’ll use these two concepts to do so.

3.2.1.1. Convex Functions#

We’ll start off with a definition you’ve probably seen in some form before:

Definition 3.11 (Convex Function)

A function φ:XR defined on a convex subset XR is convex if for all λ(0,1) and for all x1,x2X:

λφ(x1)+(1λ)φ(x2)φ(λx1+(1λ)x2).

Let’s think about what this means, intuitively. First: what the heck is a convex subset X? All that this means is, for all x1,x2X, that any possible convex combination of these elements λx1+(1λ)x2 is also X. This holds for all λ(0,1). For instance, if we are dealing with real numbers, X could just be an interval.

The idea here is that in this case, the point z=λx1+(1λ)x2 is some point in between x1 and x2. This is ensured by the fact that λ is between 0 and 1. Now, let’s think about the left side of an equation. What happens as t changes? Graphically, it turns out that it looks something like this:

../../_images/convex.png

Fig. 3.2 (A) The red line represents the line λφ(x1)+(1λ)φ(x2) for some value λ(0,1). For a convex function, for any two points x1 and x2, this line will be entirely above the actual function, φ(x). (B) A non-convex function ψ. Notice that we can choose points x1 and x2 where the red line is not always above the function.#

An important consequence that we will use throughout this course is a relatively simple real analysis result:

Lemma 3.5 (Convex functions and second derivatives)

A function φC2:XR defined on an interval X is convex on X if and only if for all xX, φ(x)0.

What this asserts is that if the function φ is further twice differentiable and the function is defined on an interval (which is a convex subset), we have a second way to assert convexity, which is (often) much easier to work with: simply check its second derivative. Visually, this solidifies an intuitive notion of convexity demonstrated by Fig. 3.2(A): a convex function will be curved upwards.

Now, we’re going to talk about some nuancy points that are consequences of the definition of convex functions. In my opinion, the proofs/intuition of these results go somewhat beyond an introductory real analysis course, so you shouldn’t worry if it doesn’t immediately make sense to you:

Lemma 3.6 (Subderivatives of convex functions)

Suppose that the function φ:XR defined on a convex open subset XR is convex. The the subderivative at a point x0X is a real number c s.t. for all xX:

cφ(x)φ(x0)xx0.

That X is open suggests that if xX, that the neighborhood {x+h}0<h<ϵX for some ϵ>0. As it turns out, for a convex function φ, we can take this even further, and can say that:

al=limh0φ(x)φ(xh)hau=limh0φ(x+h)φ(x)h

are both finite, and [al,au] is called the subdifferential (it is a set of all of the subdifferentials). When the function f is continuously differentiable on the entire domain X, this is fairly rudimentary what we are talking about here: al and au are the left and right derivatives for a point x (and, in the case of continuous functions, these are equal). The nuance here can be shown with something that is convex but not continuously differentiable, like the absolute value:

../../_images/subderiv.png

Fig. 3.3 In this case, we can see two sub-tangent lines at the point x=0, which are lines which intersect with f(x)=|x| at x=0 but whose slopes are in the interval formed by the left and right derivatives, 1 and 1 respectively. The sub-derivatives exist since f(x)=|x| is convex; if you are struggling with why it is convex, think about the lines we drew previously in Fig. 3.2.#

Now we get to one of the fundamental results of integration:

Theorem 3.2 (Jensen’s Inequality)

Suppose that:

  1. (Ω,F,P) is a probability space,

  2. XR is an interval,

  3. fmF:ΩX is a measurable function,

  4. φmR:XR is convex, and

  5. f and φ(f)=φf are P-integrable.

Then:

φ(fdP)φ(f)dP.

This result, it turns out, is pretty easy to prove if f is also C2, but that wouldn’t quite be as general as we want it to be: this result can hold for any convex functions, not just the C2 ones. In the proof below, we use the concept of the sub-derivative that we just intuited our way through, and then we’ll recap why we had to use sub-derivatives at all once we’re done:

Proof. Let c=fdP, which is finite since f is P-integrable Definition 3.10.

Let (x)=ax+b be a linear function for a,b,xR s.t. (c)=φ(c), and φ(x)(x). Such a function exists, since by the convexity of φ, we can see that with:

al=limh0φ(c)φ(ch)hlimh0φ(c+h)φ(c)h=au,

Letting a[al,au] (a is a sub-derivative of φ) and (x)=a(xc)+φ(c) gives the desired properties, as:

xc(x)au(xc)+φ(c),defn. of a sub-derivative,aaux=c(c)=φ(c).

Note, then, that φ(x)(x) for any xR, and consequently, φff, so φfa.e.f.

It is pretty that f is a rescaling of an integrable function f by a and a sum with a constant term, ac+φ(c). Particularly, here note that since we have a probability space, the measure of the entire space is finite, and the constant is integrable by Remark 3.1. Therefore, f is integrable by Property 3.11 and Property 3.12.

Then by Corollary 3.1, since φf is integrable by supposition:

φfdPfdP=(afc+φ(c))dP=a(fdPc)+φ(c)=(fdP),(x)a(xc)+φ(c)=φ(fdP),(c)=φ(c)

where by construction, c=fdP.

So: why did we have to use sub-derivatives, and what did they let us do? Well, since φ is only convex, it is entirely possible that the derivative doesn’t exist at a point we are interested in (think about if the absolute value had an inflection right at x=fdP instead of at x=0, like in the figure we considered; e.g., φ(x)=|xfdP|). So, what we did was, we constructed a sub-tangent line via a sub-derivative at this point, just to give us extra protection for the general case where we could have a non-existant derivative.

3.2.1.2. Norms#

Jensen’s inequality will make properties about norms of random variables very easy to prove. What’s a norm you might ask?

Definition 3.12 (Functional norm)

Suppose that (Ω,F,μ) is a measure space, and suppose that p[1,). The functional norm of a function fmF:ΩR is:

||f||p(|f|pdμ)1p

We tend to classify functions as those that have finite functional norms:

Definition 3.13 (Lp space)

Suppose that (Ω,F,μ) is a measure space, and suppose that p[1,). Then the Lp(μ) space is the set of measurable functions:

Lp(μ){fmF:||f||p<}

We can use functional norms to obtain some more desirable properties of integration. Let’s check out Hölder’s inequality:

Theorem 3.3 (Hölder’s inequality)

Suppose that (Ω,F,μ) is a measure space, that f,gmF, and that p,q[1,) are s.t. 1p+1q=1. Then:

|fg|dμ||f||p||g||q.

Proof. If ||f||p=0, then note that by definition, μ({ω:f(ω)0})=0 and so f=a.e.0, and vice-versa for ||g||q.

In this case, the product fg=a.e.0, as (fg)(ω)f(ω)g(ω)=0 as well, satisfying the inequality.

Therefore, suppose that ||f||p,||g||q>0. Further, WOLOG, assume that ||f||p=||g||q=1. Note that this applies generally, as we could simply take f(ω)=f(ω)||f||p and vice-versa for g, and we would obtain a result that is simply a scalar multiple of ||f||p||g||q, since |f(ω)g(ω)|=||f||p||g||q|f(ω)g(ω)| since by definition, ||f||p,||g||q>0.

Note that for any x,y0, that using basic properties of the exp and log functions:

xy=exp(log(xy))=exp(logx+logy)=exp(1pplogx+1qqlogy)=exp(1plogxp+1qlogyq)

Notice that exp(x) is convex, since its second derivative is positive by Lemma 3.5. Then since 1p+1q is positive:

xy1pexp(logxp)+1qexp(logyq)=xpp+yqq.

Taking x=|f(ω)| and y=|g(ω)|, we see that:

|f(ω)||g(ω)|=|f(ω)g(ω)||f(ω)|pp+|g(ω)|qq,

which holds for all ωΩ, so |fg||f|pp+|g|qq. Integrating:

|fg|dμ(|f|pp+|g|qq)dμ=||f||pp+||g||qq=1p+1q,||f||p=||g||q=1 WOLOG=1,1p+1q by supposition=||f||p||g||q.

3.2.2. Convergence Results#

3.2.2.1. Convergence Concepts#

Next, we get to the convergence theorems for integrals. To do this, we first need two quick definitions to get us started:

Definition 3.14 (Convergence Almost Everywhere)

Suppose the measure space (Ω,F,μ), and the measurable functions f,fnmF, for all nN. We say that fnna.e.f if:

μ({ωΩ:limnfn(ω)f(ω)})μ(limnfnf)=0.

The idea here is that for all but a set of measure 0, fn(ω)nf(ω). Stated another way, we have pointwise convergence of fn to f almost everywhere.

We can also understand this definition using the concept of the lim sup of a set, in an ϵ sort of way like you are used to in real analysis:

Definition 3.15 (Equivalent definition for convergence almost everywhere)

Suppose the measure space (Ω,F,μ), and the measurable functions f,fnmF, for all nN. We say that fnna.e.f if for every ϵ>0:

μ(lim supn{ωΩ:|Xn(ω)X(ω)|>ϵ})=0.

In this definition, the intuition is that we are focusing on a sequence of sets (indexed by n) which are the points ω of the sample space Ω which are not ϵ-close to f(ω). These are the sets of the form:

{ωΩ:|fn(ω)f(ω)|>ϵ}nN.

Remember that lim supn of a sequence of sets can be more specifically be understood to be the infnsupmn, so we are concerned with the inf of sets of the form:

Fn=supmn{ωΩ:|Xm(ω)X(ω)|>ϵ}

So, the interpretation of Fn here is that it’s the largest possible measurable set of the sample space where for every mn, |fm(ω)f(ω)|>ϵ (which is the definition of fm(ω) not being ϵ-close to f(ω) which is a condition for the limit to be f(ω)). The nuance here is that we use a supremum, which is because it’s not immediately clear that the largest possible set that fulfills this criterion will necessarily be measurable (but, as-per Property 2.17, we know for sure that the supremum is measurable).

By construction, since n is increasing, notice that {Fn}nN are monotone non-increasing: FnFn+1, for all nN. Intuitively, since it is bounded (from below by ) and monotone non-increasing, the infimum of these sequences of sets exist (by Lemma 2.3). We know for sure that the resulting set is further measurable by just checking Property 2.16.

Next, we get to a practically distinct definition, which almost looks the same. This concept is called convergence in measure:

Definition 3.16 (Convergence in measure)

Suppose the measure space (Ω,F,μ), and the measurable functions f,fnmF, for all nN. We say that fnnf in measure if for any ϵ>0, then:

μ({ωΩ:|fn(ω)f(ω)|>ϵ})μ(|fnf|>ϵ)n0.

While these definitions almost look the same, the practical distinction is that the limit, this time, is outside of the measure statement. The idea here is that, as n grows, the measure of the set of points ω where fn(ω) is not ϵ-close to f(ω) is converging to zero. This contrasts from the fact that in the preceding statement, the measure of the set of points ω where fn(ω) is not ϵ-close to f(ω) as n is zero. Intuitively, convergence almost everywhere, in fact, implies convergence in measure (with the slight note that the entire space Ω must have finite measure). Let’s formalize this up a bit:

Lemma 3.7 (Convergence almost everywhere implies convergence in measure)

Suppose the measure space (Ω,F,μ), and the measurable functions f,fnmF, for all nN, where {fn}nNmF. If fnna.e.f and μ(ω)<, then fnnf in measure.

Proof. Suppose that fnna.e.f.

Let Ω1{ωΩ:limnfn(ω)=f(ω)}, and Ω1c=ΩΩ1. By construction, note that μ(Ω1c)=0.

By definition of na.e., then μ(Ω1c)=0. Fix ϵ>0, and consider the sequence of sets:

Fnmn{|fnf|>ϵ}.

By construction, note that FnFn+1, where FnF=nNFn.

Further, note that by design, for any ωΩ1, that limnfn(ω)=f(ω). Then for any ϵ>0, there exists Nϵ s.t. for all nNϵ, |fn(ω)f(ω)|ϵ, by definition of a limit.

Then for all n>Nϵ, ωFn, and consequently ωF.

Then FΩ1=; that is, FΩ1c.

Then by definition of a measure, μ(F)μ(Ω1c)=0 by the way we constructed μ(Ω1)c. Further, since measures are lower-bounded by 0, this implies that μ(F)=0.

Then since FnF:

μ(|XnX|>ϵ)μ(Fn),{|XnX|>ϵ}FnP(A)=0 as n,

by the convergence from above property of measures, Property 2.6.

We aren’t quite ready to handle the result that fnnf is measure does not imply that fnna.e.f, so we’ll rotate back to this a little later in the course.

Let’s see what these two concepts will allow us to do.

3.2.2.2. Convergence Theorems#

Theorem 3.4 (Bounded Convergence)

Suppose the measure space (Ω,F,μ), where:

  1. FF is a μ-finite set, where μ(F)<,

  2. {fn}nNmF is a sequence of measurable functions which vanish on Fc; that is, ωFcfn(ω)=0,

  3. There exists M s.t. |fn(ω)|M (fn are each bounded), and

  4. fnnf in measure.

Then:

Ffdμ=limnFfndμ.

The idea here is that we are conceptually moving the limit across the integral: the left hand side can be thought of as the integral of limnfn on sets that are progressively encompassing more and more of the sample space Ω.

Proof. Suppose that ϵ>0, and define Gn{ωF:|fn(ω)f(ω)|ϵ}, and BnFGn={ωF:|fn(ω)f(ω)|>ϵ}. Intuitively, Gn are the points of F for a particular n where fn(ω) is ϵ-close to f(ω), and Bn are the points where fn(ω) is not.

Then by Theorem 3.1:

|FfdμFfndμ|=|F(ffn)dμ|F|ffn|dμ,Jensen's inequality, as || is convex=Gn|ffn|dμ+Bn|ffn|dμ,GnBn=Fϵμ(F)+2Mμ(Bn),

For the left-hand expression, we used that for ωGn, |fn(ω)f(ω)|ϵ by construction. For the right hand side, we used that |fn|M|f|M (bounded functions can only converge to a bounded function), and consequently we applied the triangle inequality, |ffn||f|+|fn|2M.

Continuing:

|FfdμFfndμ|nϵμ(E),

since μ(Bn)n0 by definition of fnf in measure. Noting that 0μ(F)< by supposition, and that ϵ was arbitrary, gives the desired result.

So, intuitively, as the functions fn get closer to f in measure (the sets on which they disagree have measure converging to 0), somewhat intuitively, the integrals converge, too. The key here is that the bounded convergence theorem applies for bounded functions. We have a somewhat equivalent, albeit practically much more applicable, result for non-negative functions:

Lemma 3.8 (Fatou)

Suppose the measure space (Ω,F,μ), where {fn}nNmF is a sequence of non-negative functions; e.g., fn0. Then:

lim infnfndμlim infnfndμ.

Proof. Define gn to be the function where gn(ω)infmnfn(ω). It follows that for all nN, fn(ω)gn(ω), since gn(ω) is the infimum of a set which contains fn(ω) by construction.

Note that as n, that gn(ω)g(ω) converges below to some value g(ω) (possibly infinite), which follows because gn+1(ω)gn(ω), since {fm(ω):mn+1}{fm(ω):mn} ({gn(ω)}n is monotonically increasing in n).

Then:

gn(ω)g(ω)=supngn(ω)=lim infnfn(ω).

Since fndμgndμ by Corollary 3.1, it is sufficient to show that:

lim infngndμinfgdμ.

Let {Fm}F be a sequence of sets which finite measure, where FmΩ as m. Since gn0, then for fixed m:

(gn(ω)m)1{Fm}(ω)n(g(ω)m)1{Fm}(ω).

Then:

lim infngndμ(gnm)1{Fm}dμ,0gnmgn=Fm(gnm)dμnFm(gm)dμ,

Note that both sides are at most upper-bounded by m, so Theorem 3.4 applies in the bottom line.

Taking the sup over m:

lim infngndμ=supmlim infngndμnsupmFm(gm)dμ=gdμ,

Notice that by definition of convergence from below, that by Lemma 3.2 applies in the bottom line, and we are finished.

This is clearly much more general than the Bounded Convergence Theorem: the only restriction we have here is that we have a sequence of non-negative functions; we don’t need a sequence of functions which is bounded and converging in measure.

When the functions are converging below, we can further clarify the nature of the convergence with a slight extension of Fatou’s Lemma: the integrals will also converge from below. In other words, functions converging “monotonely” have “monotonely” converging integrals:

Theorem 3.5 (Monotone Convergence (MCT))

Suppose the measure space (Ω,F,μ), where {fn}nNmF is a sequence of non-negative functions; e.g., fn0, where fnf μ-a.e. (fn is monotone increasing to f). Then:

fndμfdμ,

as n.

As you notice, this theorem statement looks a lot like the statement from Fatou’s Lemma Lemma 3.8, and in fact, we’ll use some of the intuition from Fatou’s Lemma to make this proof rigorous:

Proof. By Fatou’s Lemma Lemma 3.8, fn0 implies that:

lim infnfndμlim infnfndμ=fdμ,

since fnf as n by supposition.

Conversely, as fna.e.f for all nn, we see that by Corollary 3.1:

lim supnfndμfdμ.

Together, this gives that:

limnfndμ=fdμ,

Which is because we have that lim supnfndμfdμlim supnfndμ, which only holds in the case of equality because the lim sup is greater than or equal to the lim inf of the same sequence.

Finally, note that 0fnf μ-a.e. as n, which implies that 0a.e.fna.e.fn+1, so by Corollary 3.1, we have that:

fndμfn+1dμ

for all nN.

Then {fndμ}n is a monotone non-decreasing sequence, and its limit is fdμ, as desired.

Next, we’ll see another application of Fatou’s Lemma, which is called the Dominated Convergence Theorem. Basically, what this theorem asserts is that if a sequence of measurable functions fn are converging almost surely to another function f, and can be dominated by a μ-integrable function, then the integrals converge, too:

Theorem 3.6 (Dominated Convergence (DCT))

Suppose the measure space (Ω,F,μ), where:

  1. {fn}nNmF,

  2. fmF is a function where fnna.e.f, and

  3. g is μ-integrable,

  4. |fn|g for all nN.

Then:

fndμnfdμ.

The idea here is that g is dominating the fn by condition 4. Consequently, since fn are converging a.e. to f, then g dominates f too, which suggests that f (and further, the entire sequence {fn}) are μ-integrable. Intuitively, this is because the integral of any function which is dominated by g can be at most |g|dμ.

Proof. Note that since |fn|g, then fngfn+g0, and consequently, Fatou’s lemma applies. Then:

lim infn(fn+g)dμlim infn(fn+g)dμ,Fatou=(f+g)dμ,fna.e.f.

By subtracting gdμ from both sides:

gdμlim infnfndμfdμ,

which follows by Property 3.12.

Applying the same approach to fn, we obtain that lim infnfndμfdμ, which implie sthat lim supnfndμfdμ.

Since lim infnfndμfdμlim supnfndμ, equality must hold, since the lim sup is greater than or equal to the lim inf of the same sequence.

3.2.3. Measure Restriction#

The final building block we will need in integration is the concept of measure restrictions. As its name somewhat suggests, a measure restriction basically lets us take an existing measure space (Ω,F,μ), and define a new measure space from only a subset FΩ that agrees with μ on a new σ-algebra that is, intuitively, induced by F. In some sense, this is kind of the opposite of the extension theorems we saw previously.

If you recall, we built machinery that worked on σ-algebras by building machinery on much simpler families of sets (such as algebras), and then simply argued that the machinery also, by construction had to work on the σ-generated algebra, too. Here, we’re going to take existing machinery that works on a measure space, and show that we can take arbitrary subsets of the event space Ω and use it to produce new measurable spaces.

Let’s give this idea a go:

Corollary 3.5 (Defining a measure by restriction)

Suppose that (Ω,F,μ) is a measure space, and let:

  1. {Fn}nNF is a set of disjoint events,

  2. F=nNFn, and

  3. fmF is a μ-integrable function.

Then:

n=1Fnfdμ=Ffdμ.

Further, if f0, and for any AF s.t. AF, then ν(A)=Afdμ defines a measure on (F,FF), where:

FF{AF:AF}.

Proof. Define fmf1{n[m]Fn}, and let fFf1{F}. Then:

  1. fmnfF,

  2. |fm|f for all mN by construction since FmF, and

  3. |f|dμ<, since by supposition, f is μ-integrable.

Then by the Dominated Convergence Theorem 3.6:

Ffdμ=fFdμlimmf1{n[m]Fn}dμ=limmn[m]Fnfdμ,DCT=limmn[m]Fnfdμ,Fm are disjoint=n=1Fmfdμ.

The second to last result follows by noting that n[m]Fnfdμ=n[m]1{Fn}fdμ by the disjointness of {Fm}.

Then by construction, ν is countably additive.

If further f0, then ν0 by construction, indicating that ν is a measure since it is countably additive and non-negative.

We can repeat this argument for any AF and a countably additive disjoint sequence of events {An} whose union is A to obtain that the desired result holds for any AFF.

To see that (F,FF) is a measurable space, all that’s left is to show that FF is a σ-algebra on F:

1. Contains F: Since FF and FF, then FFF by definition.

2. Closed under complements: Suppose that AFF.

Define AFc=FAFAcF (AFc is the complement of A in F, and Ac is the complement of A in Ω).

Notice that as F,AcF, that AFcF, as F is a σ-algebra (and hence closed under intersections and complements).

Since FAcF, then AFc=FAcFF, by definition.

3. Closed under countable unions: Suppose that {An}FF is a disjoint sequence of sets.

Then A=nAnF, because element-wise, each AnF, and hence the union cannot be F.

Since F was a σ-algebra, then AF, and AF.

Then AFF, by definition.