3.3. Expectation#
In this section, we’ll learn about a particular type of integral. Remember that an integral is defined on a measurable space and with respect to a measure; e.g., the measure space
When this measure is further a probability measure Definition 2.35, we call this integral something special: we call it an expectation, or expected value. Let’s start with a definition.
Definition 3.17 (Expected value)
Suppose that
Notationally, you might see this quantity written in several different ways in this book. When we are only talking about a single variable, we’ll usually just write
In any previous courses you might have taken on probability or statistics, you might have seen more complicated notations for an expected value (things like subscripts, which get extraordinarily taxing because nearly every statistician/probabilist seems to have a different understanding of what goes in the subscript). For now, we’re going to do our best to just omit this cumbersome notation and keep it simple: when you see an expectation of a random variable, you can always just think of it as an integral with respect to the corresponding probability space for that random variable. When we learn about product spaces, we’ll recap again what it means when we talk about expectations involving multiple random variables.
Now, let’s get some lingo under our belt for expectations. Let’s start with the term “existence of an expectation”:
Definition 3.18 (Existence of expected value)
Suppose that
whenever
In words, what we are with this wording is just that the expectation of a random variable
Now that we have some of the basics under our belt, I’d recommend you check back to our study of
Throughout this section, we are basically going to ad-nauseam regurgitate properties, theorems, lemmas, etc. directly from Section 3.1 and Section 3.2. This is because as you can see, expectations are just integrals, so all of the properties of integrals that we learned about over there will apply here, too. Further, since the measure in this case is a probability measure, we will be able to be even more specific about some aspects of the integral, and will be able to make several bigger results here.
3.3.1. Expectation corollaries#
To start off, we need to introduce a brief new notation. You are already immediately familiar with this term, but with a slightly different word for is:
Definition 3.19 (Relation holds almost surely)
Suppose that
We write that
You will notice that this statement is basically exactly the same as the statement that we made when it came to a relation holding almost everywhere back in Definition 3.3, except for the fact that the domain in this case is a probability space. Again, if a relation holds almost surely, it also holds almost everywhere, but not necessarily the reverse (for the reason we gave back when we first introduced almost sure statements in Definition 2.37).
Property 3.13 (Expectation basics)
Suppose that
(they are non-negative), or (they are -integrable).
Then:
,for any
, then , andIf
, then .
The proof of these statements are extremely easy: we’ve already done them! Remembering that expectations are just integrals, we can directly borrow our results from Section 3.1:
Proof. If
Direct application of Property 3.9.
Direct application of Property 3.8, and Remark 3.1.
Direct application of Corollary 3.1 for non-negative functions.
If
Direct application of Property 3.12.
Direct application of Property 3.11, and Remark 3.1.
Direct application of Corollary 3.1 for
-integrable functions.
That was pretty easy, right?
3.3.2. Norms and Convexity#
All of the things we learned about norms and convexity hold over to expectations, too:
Theorem 3.7 (Jensen’s inequality for random variables)
Suppose that
Which is just a less generic restatement of what we described when we first saw Jensen’s inequality in Theorem 3.2. The fine points here are that we asserted that
There are two special cases that will come up again and again in this book, so we just want to draw your attention to them:
Corollary 3.6 (Corollaries of Jensen’s inequality)
Suppose that
.and further if
, then .
In the second to last line, notice that I used got a little bit ambitious with my notation: to some readers, it might be unclear when I write
Next, let’s investigate the concept of a norm:
Definition 3.20 (Norm of a random variable)
Suppose that
which is just about exactly the definition you got accustomed to in Definition 3.12 but for random variables defined on a probability space.
In this case, we introduced a slightly new terminology: the infinity norm
Theorem 3.8 (Hölder’s inequality)
Suppose that
which is basically just Theorem 3.3.
3.3.3. Subsets of the domain#
When we are considering subsets of the domain, we have a special notation in probability theory:
Definition 3.21 (Expectation over a subset of the domain)
Suppose that
The basic idea is that
Further, when
Definition 3.22 (Expectation of an indicator)
Suppose that
Remember that indicators were how we defined simple functions back in Definition 3.1. Therefore if we have a measure
Next up, we can see how we can use this notation to derive a neat result known as Markov’s inequality:
Theorem 3.9 (Markov’s inequality)
Suppose that:
is a probability space, is a random variable, is s.t. ( is a non-negative function), and is a measurable set of the codomain . Define . Then:
There’s quite a bit going on in that statement, so let’s break it down where you’re probably starting to get confused. In this statement, remember that
Proof. Note that since
Further, notice that
which is because
Then with Property 3.13(3), taking the expectation preserves the inequalities, and by Property 3.13(2) rescaling by
Pretty easy, right? Next up, we’ll see a corollary of Markov’s inequality, called Chebyshev’s inequality:
Corollary 3.7 (Chebyshev’s inequality)
Suppose that
What we are saying here is that we can find a direct relationship between the probability of the portion of the domain where
Proof. Define
Note that
Applying Theorem 3.9 gives the desired result.
In practice, you might see Markov’s inequality called Chebyshev’s inequality, but we’ll stick to the nomenclature we described so that we can be explicit.
Another powerful corollary of Markov’s inequality is that if the expectation of the absolute value random variable is finite, then the random variable is finite almost surely:
Lemma 3.9 (Bounded absolute expectation)
Suppose that
Notice that here, we slightly abused the fact that if an event (here, finiteness) happens almost surely, it also occurs almost everywhere (the space on which finiteness does not hold has probability
3.3.4. Convergence concepts#
Just like we were able to directly extend results on the basics of integration and the results on norms and convexity, we can do the same thing with convergence concepts. Let’s see how this works:
Lemma 3.10 (Fatou)
Suppose that
Proof. Direct application of Lemma 3.8.
3.3.4.1. Convergence almost surely and convergence in probability#
To understsand the monotone convergence theorem and some successive results, we’ll just make unambiguously clear the term almost sure convergence, which is basically the term Definition 3.14:
Definition 3.23 (Convergence almost surely)
Suppose that
So, the idea here is that a random variable converges almost surely if the set of points which are mapped
We can also understand this definition using the concept of the
Definition 3.24 (Equivalent definition for convergence almost surely)
Suppose that
The interetation of this quantity is the same as the one you saw in Definition 3.14.
We have another term for the special case of convergence in measure when we are dealing with a probability space, too:
Definition 3.25 (Convergence in probability)
Suppose that
So, the idea here is that a random variable converges almost surely if the set of points
Further, we have the same catch as we did with convergence almost surely implying convergence in probability:
Lemma 3.11 (Convergence almost surely implies convergence in probability)
Suppose that
Proof. Direct application of Lemma 3.7, noting that a probability space is
3.3.4.2. The rest of the convergence concepts#
Finally, we’ll just rattle off the convergence concepts from the last section for posterity:
Theorem 3.10 (Monotone Convergence)
Suppose that
Proof. Direct application of Theorem 3.5.
Theorem 3.11 (Dominated Convergence)
Suppose that
,There exists
s.t. for all , and ,
Then:
If
3.3.5. Change of variables#
Obviously, the change of variables formula applies here, too:
Lemma 3.12 (Change of variables)
Suppose that
, or .
Then with
Proof. Direct application of mt:int:props:cov
.