1.1. Limits#
To start off our discussion of probability theory, we’re going to start with the most fundamental concept you should have covered in real analysis: limits. All of this material should be review from your real analysis course, and you should know all of these results inside and out off the top of your head. We’ll briefly cover the basics here, just to familiarize you with the particular notation we’re going to use in this book.
1.1.1. Set Theory Basics#
(Set)
A set \(\mathcal X\) is a collection of distinct objects.
This collection of objects is left to the mind of the reader; it might be the natural numbers \(\mathbb N\), it might be the real numbers \(\mathbb R\), it might be the items sitting in your desk next to you. When sets are countable, we will often index them be their elements. For instance, if \(\mathcal X\) is a set with \(n\)-elements, we might illustrate that by saying \(\mathcal X = \left\{x_i\right\}_{i = 1}^n\). In this case, \(x_i\) is the \(i^{th}\) element of the set \(\mathcal X\). Remembering from real analysis, sets can be countably finite (its elements can be enumerated by a fixed number of natural numbers), countably infinite (its elements cannot be enumerated by a fixed number of natural numbers, but its cardinality is in a bijection with the natural numbers), or uncountably infinite (its cardinality exceeds that of the natural numbers). There are other “orders of size” to quantify sets that are uncountably infinite, but they will not be the focus of this book.
Often in probability theory, we’re going to be interested in the behavior of sequences of numbers, which brings us to our next definition:
(Sequence)
A sequence on a set \(\mathcal X\) is a function \(x : \mathbb N \rightarrow \mathcal X\).
For instance, we had a sequence on \(\mathbb N\), our sequence might specify that \(x(1) = 1\), \(x(2) = -\frac{1}{2}\), \(x(3) = \frac{1}{3}\), \(x(4) = -\frac{1}{4}\), … so on and so forth, known as a geometric sequence. This notation of having extra parentheses can get cumbersome, so when we’re talking about sequences, we might often overload the notation we used for sets a little bit and just use \(x_i\) for short. We might also express this notation a little bit more compactly, by using a similar expression that we used to define sets, writing: \((x_n)_{n \in \mathbb N}\). We might even make this even smaller by just writing \((x_n)\), just so that we can be clear that we are talking explicitly about a specific sequence or set. Here, we’re using the parentheses to delineate ultra explicitly that a sequence is distinct from a set, in that a sequence can have duplicate values. So, notationally, when you see \(\{\cdot\}\), you can expect that every element is unique (it is a set). When you see \((\cdot)\), you can anticipate that it doesn’t matter whether every element is unique (it is a sequence). In symbols, we could express this sequence with set theory as:
##TODOFIGURE
Notice the number to the right of the equation. This will let us talk about what is discussed in this equation later on.
Next, we’ll talk about another useful quantity, the sub-sequence:
(Sub-sequence)
Let \((x_n) \subset \mathbb R\) be a sequence. \((y_n) \subset \mathbb R\) is a sub-sequence of \((x_n)\) if there exists a monotone increasing function \(m : \mathbb N \rightarrow \mathbb N\) s.t. \(y_n = x_{m(n)}\).
Notice that since \(m\) is monotone increasing, that \(m(i) > m(i - 1)\) if \(m(i) \geq i\).
1.1.2. Notable quantities in set theory#
Now that we have sets and sequences, we’re ready to recall some notable quantities that you learned about in real analysis. We’ll begin with the concepts of upper and lower bounds.
(Upper (Lower) Bounds)
Let \(\mathcal X \subset \mathbb R\). Then \(y \in \mathbb R\) is an upper (lower) bound of \(\mathcal X\) if \(\forall x \in \mathcal X\), \(x \leq y\) (\(x \geq y\)).
Notice that by this definition, if we were to let \((x_n)\) be the geometric sequence we defined above, that suitable upper bounds could be any number greater than or equal to \(1\), and suitable lower bounds could be any number less than or equal to \(-\frac{1}{2}\). But what importance do \(1\) and \(-\frac{1}{2}\) have? These are known as the supremum and the infimum of \(\mathcal X\), respectively:
(Supremum)
Let \(\mathcal X \subset \mathbb R\). \(y \in \mathbb R\) is a supremum of \(\mathcal X\) if it is the least upper bound of \(\mathcal X\). We write that \(y = \sup \mathcal X\).
(Infimum)
Let \(\mathcal X \subset \mathbb R\). \(y \in \mathbb R\) is a infimum of \(\mathcal X\) if it is the greatest lower bound of \(\mathcal X\). We write that \(y = \inf \mathcal X\).
As you can see here, \(1\) is the biggest number in \(\{x_n\}\), as all the other terms are going to be smaller than it just by construction, which means \(\sup \{x_n\} = 1\). Likewise, \(-\frac{1}{2}\) is the smallest number in \(\{x_n\}\), so \(\inf \left\{x_n\right\} = -\frac{1}{2}\). Sets and sequences do not, necessarily, need to have infimums nor supremums (nor upper/lower bounds), but they tend to be really interesting when they do, which is why these concepts will be essential in probability theory. These definitions (upper/lower bounds, and supremums/infimums) extend directly to sequences (and consequently, sub-sequences).
Next, we get to the concept of limits:
(Limit)
Let \(\mathcal X \subset \mathbb R\), and let \((x_n)\) be a sequence on \(\mathcal X\). The sequence converges to \(x \in \mathbb R\) if \(\forall \epsilon > 0\), there exists an \(m \in \mathbb N\) such that (s.t.) for all \(i > m\):
and is written as either \(\lim_{n \rightarrow \infty} x_n = x\) or \(x_n \xrightarrow[n \rightarrow \infty]{} x\).
If a sequence \((x_n)\) has a limit \(x\) that is finite, we say that \((x_n)\) converges to \(x\). If the sequence doesn’t have a limit (think of a sequence of alternating \(1\)s and \(2\)s) or the limit is infinite (positive or negative), we say that the sequence diverges.
Let’s think briefly about what this statement means conceptually. If we were to choose a particular value of \(\epsilon\), we could always find an index of the sequence \(m \in \mathbb N\) where, for all the remaining values, they are at most a width \(\epsilon\) from \(x\) (and, possibly equal to it). We could make \(\epsilon\) as small as we wanted, and there would always be some index of the sequence where this were true. In this particular sequence, for instance, the limit is \(0\). If we chose \(\epsilon = \frac{1}{3}\), then for any \(i > 3\), \(|x_i| < \frac{1}{3}\). We could repeat this for \(\epsilon = \frac{1}{4}\), so on and so forth all the way down to any number we chose exceeding zero.
Next, let’s remember two extremely important related concepts, the limit superior and inferior:
(Limit Superior)
Let \(\mathcal X \subset \mathbb R\), and let \((x_n)\) be a sequence on \(\mathcal X\). The limit superior is defined as:
(Limit Inferior)
Let \(\mathcal X \subset \mathbb R\), and let \((x_n)\) be a sequence on \(\mathcal X\). The limit inferior is defined as:
Conceptually, these can be thought of as upper and lower bounds for the tails of the sequence. The tail of the sequence is the subsequence which includes every number coming after a particular number, here denoted by \(n\). Let’s take a look at what the sups of the tails look like for our sequence we saw above in Equation (1.1):
\(\sup_{i > 1} x_i = \frac{1}{3}\), because we’ve cut off the first element of our sequence. The maximum of the remaining entries is \(\frac{1}{3}\), which means that the sup is \(\frac{1}{3}\).
\(\sup_{i > 2} x_i = \frac{1}{3}\), because we’ve cut off the first two elements of our sequence. The maximum of the remaining entries is \(\frac{1}{3}\), which means that the sup is still \(\frac{1}{3}\).
\(\sup_{i > 3} x_i = \frac{1}{5}\), because we’ve cut off the three element of our sequence. The maximum of the remaining entries is \(\frac{1}{5}\), which means that the sup is \(\frac{1}{5}\).
\(\sup_{i > 4} x_i = \frac{1}{5}\), because we’ve cut off the first four elements of our sequence. The maximum of the remaining entries is \(\frac{1}{5}\), which means that the sup is still \(\frac{1}{5}\). ##TODO FIGURE If you continue this logic on and on, and keep extending that tail further and further out, you eventually come to the conclusion that the limit of these sups is \(0\). You can repeat this logic to see that the limit of the infs will also be \(0\), which leads us to our first useful theorem:
(Limit Superior and Inferior Equality)
Let \(\mathcal X \subset \mathbb R\), and let \((x_n)\) be a sequence on \(\mathcal X\). Then there exists \(x \in \mathbb R\) s.t. \(x_n \xrightarrow[n \rightarrow \infty]{} x \iff \liminf_{n \rightarrow \infty}x_n = \limsup_{n \rightarrow \infty}x_n\).
The notation \(\iff\) is used to indicate “if and only if”, which means that the each side of the statement implies the other. If the sequence has a limit, the limit inferior and limit superior are equal. Likewise, if the limit inferior and the limit superior are equal, the sequence has a limit.
\(\liminf\) and \(\limsup\))
(Equivalent denotions forLet \(\mathcal X \subset \mathbb R\), and let \((x_n)\) be a sequence on \(\mathcal X\). Then \(\lim_{n \rightarrow \infty}\sup_{i > n}x_n \equiv \inf_{n \rightarrow \infty} \sup_{i > n}x_i\), and \(\liminf_{n \rightarrow \infty} x_n \equiv \sup_{n \rightarrow \infty}\inf_{i > n}x_i\).
Conceptually, this denotion is going to be really important in this book, so we’re going to prove this out below for the \(\limsup\) case. The \(\liminf\) case is virtually equivalent:
Proof. Let \(y_n = \sup_{i > n}\left\{x_i\right\}\). Notice that for any \(n \in \mathbb N\), that \(y_n \leq y_{n + 1}\), because \(y_{n + 1}\) was calculated as the supremum over a smaller sub-sequence \((x_i : i > n + 1)\) than \(y_n\), which was calculated as the supremum over the sub-sequence \((x_i : i > n)\).
Therefore \((y_n)\) is a decreasing sequence.
Since \((y_n)\) is a decreasing sequence, its limit will be its infimum (possibly \(-\infty\)).
For the other equivalence, simply note that the infimum over \(i > n\) would give us an increasing sequence by the same argument, and therefore its limit will be its supremum (possibly \(\infty\)).
1.1.3. Convergence from above and below#
Next, we have two abbreviations that should be clear to you from real analysis, but you might not have seen before. They are the symbols for convergence from above and below:
(Convergence from above)
Let \(\mathcal X \subset \mathbb R\), and let \((x_n)\) be a sequence on \(\mathcal X\). Suppose further that \(x_n \xrightarrow[n \rightarrow \infty]{} x\), where \(x \in \mathbb R\) is finite. Then \(x_n\) converges from above to \(x\) if \(x_n \geq x\) for all \(n\), and can also be written \(x_n \downarrow x\) as \(n \rightarrow \infty\).
(Convergence from below)
Let \(\mathcal X \subset \mathbb R\), and let \((x_n)\) be a sequence on \(\mathcal X\). Suppose further that \(x_n \xrightarrow[n \rightarrow \infty]{} x\), where \(x \in \mathbb R\) is finite. Then \(x_n\) converges from below to \(x\) if \(x_n \leq x\) for all \(n\), and can also be written \(x_n \uparrow x\) as \(n \rightarrow \infty\).
The idea here is that with convergence from above, the sequence is converging to the limit \(x\), but all of the sequential terms \(x_n\) exceed \(x\), and the logic is the same with convergence from below.
For instance, consider the sequence \((x_n)\) where \(x_n = \frac{1}{n}\). Notice that this sequence has a limit of \(0\), but that \(\frac{1}{n} > 0\) for all \(n\). Then \(x_n \downarrow 0\) as \(n \rightarrow \infty\).
Likewise, consider the sequence \((x_n)\) where \(x_n = -\frac{1}{n}\). Notice that this sequence has a limit of \(0\), but that \(-\frac{1}{n} < 0\) for all \(n\). Then \(x_n \uparrow 0\) as \(n \rightarrow \infty\).