概率论笔记

Probability Space

A probability space consists of three components $\ \Omega, \mathcal{F}, P$ .

The component $\ \Omega$ is the sample space.
Each element $\ \omega \in \Omega$ is called an outcome.
The component $\ \mathcal{F}$ is a set of subsets of $\ \Omega$ and satisfies the following conditions,
$\left\{ \begin{array}{l} \Omega \in \mathcal{F}\,, \\ i\!f\ A \in \mathcal{F}, \ then\ \Omega - A \in \mathcal{F}\,, \\ i\!f\ A_1, A_2, \cdots \in \mathcal{F}, \ then\ \bigcup\limits_{i \in Z^+} A_i \in \mathcal{F}\,. \end{array} \right.$
The component P is a measure on $\ \mathcal{F}$ , satisfying
$\left\{ \begin{array}{l} P(A) \geqslant 0, \ \forall A \in \mathcal{F}\,, \\ P(\Omega) = 1\,, \\ i\!f\ A_1, \cdots, A_k, \cdots \in \mathcal{F}, \ A_i \cap A_j = \emptyset, \ then\ P\Big(\bigcup\limits_k A_k\Big) = \sum\limits_k P(A_k)\,. \end{array} \right.$

Examples

$\ \Omega = \{1, 2, \cdots, N\}$ , $\ \mathcal{F} = \{all\ subsets\ of\ \Omega\}$

$\ P(A) = \displaystyle\frac{1}{N}(cardinality\ of\ A)$
$\ \Omega$ is the interior of the circle with center $\ (0,0)$ and radius $\ 1$ ,
$\ \mathcal{F} = \{all\ measurable\ subsets\ of\ \Omega\}$

$\ P(A) = \displaystyle\frac{measure\ of \ A}{\pi}$

Random Variable

Let a probabilty space $\ (\Omega, \mathcal{F}, P)$ be given.

A random variable is a function $\ X$ from $\ \Omega$ to the real axis $\ \mathbb{R}$ that satisfies $\ \{ \omega \in \Omega, X(\omega) \leqslant C \} \in \mathcal{F}$ .
The cumulative distribution function(CDF) is defined by $\ F(x) = P(X \leqslant x)$
It satisfies the following properties:
1. $\ F$ is nondecreasing
2. $\ F(x) \to 1 \ (x \to +\infty)$
  $\ F(x) \to 0 \ (x \to -\infty)$
3. $\ F(x)$ is right continuous

Discrete Random Variables

An integer-valued random variables called a discrete random variable.

Its cumulative distribution function is

$F(x) = P(X \leqslant x) = \sum\limits_{k \leqslant x} p(k)\,.$

where $\ x \in \mathbb{R}$ , $\ p(k) = P(X = k)\,.$

$\ \{p(k)\}$ is called the probability mass function(PMF). It is clear that $\ P(k) \geqslant 0$ , $\ \sum\limits_{k} p(k) = 1$ .

Its expectation(mean) and variance

$\ E(X) = \displaystyle\sum\limits_k k p(k)\,,\quad Var(X) = \displaystyle\sum\limits_k (k-\mu)^2 p(k),\ \mu = E(x)\,.$

Examples

Bernoulli random variable

$p(k) = \begin{dcases} \frac{n!}{n!(n-k)!}(1-r)^{n-k}r^k & k = 0, 1, \cdots, n \\ 0 & otherwise \end{dcases}$

$E(X) = nr\,,\quad Var(X) = nr(1-r)\,.$

Poisson random variable

$p(k) = \begin{dcases} \frac{\lambda^k}{k!}e^{-\lambda} & k \geqslant 0 \\ 0 & otherwise \end{dcases}$

$E(X) = \lambda \,,\quad Var(X) = \lambda \,.$

Continuous Random Variables

A real-valued random variable is called a continuous random variable.

Its cumulative distribution function(CDF) is $\ F(x) = P(X \leqslant x)$
If there exists a non-negative integrable function $\ p(x)$ such that $\ F(x) = \displaystyle\int_{-\infty}^x p(t)\,dt$ , then $\ p(x)$ is called the probability density function(PDF)
If $\ p(x)$ is continuous then $\ \displaystyle\frac{dp}{dx}=p(x)$ .
$\ p(x)$ satisfies
- $\ p(x)\geqslant 0$
- $\ \displaystyle\int_{\mathbb{R}}p(x)\,dx = 1\,,\ P(x_1<X\leqslant x_2) = \int_{x_1}^{x_2}\!p(t)\,dt\,.$
Its expectation(mean) $\ E(X) = \displaystyle\int_{\mathbb{R}}xp(x)\,dx$
variance $\ Var(X) = \displaystyle\int_{\mathbb{R}}(x-\mu)^2p(x)\,dx\,,\quad (\mu = E(X))$

Examples

Uniform random variable

$p(x) = \begin{dcases} \frac{1}{b-a} & a\leqslant x \leqslant b \\ 0 & otherwise \end{dcases}$

$E(X) = \frac{a+b}{2}\,,\quad Var(X) = \frac{1}{12}(b-a)^2\,.$

Gaussian random variable

$p(x) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$

$E(X) = \mu \,,\quad Var(X) = \sigma^2\,.$

It is also called the Normal random variable. Denote it by $\ N(\mu, \sigma^2)$

Gamma random variable

$p(x) = \frac{1}{\Gamma(x) \beta^\alpha} x^{\alpha - 1}e^{-x/\beta}\,,\quad (0<x<\infty,\ \alpha>0,\ \beta>0)$

$E(X) = \alpha \beta \,,\quad Var(X) = \alpha \beta^2\,.$

Properties of Expectation and Variance

The expectation of a function $\ f(X)$ with respect to a random variable $\ X$ is

$\begin{array}{ll} E(f(X)) = \displaystyle\sum f(k) p(k)\,, & discrete \\ \\ E(f(X)) = \displaystyle\int_{\mathbb{R}} f(x) p(x)\,dx\,, & continuous \end{array}$

If for arbitrary constants a and b, $\ P(X \leqslant a, Y \leqslant b) = P(X \leqslant a) P(X \leqslant b)$ , then the random variable are called independent.

Property

Let $\ X$ and $\ Y$ be random variables and $\ c$ and $\ d$ be constants. Then,

(Linearity) $\ E[cX+dY] = cE(X)+dE(Y)$
(Schwarz inequality) $\ E(XY) \leqslant (E(X^2))^{1/2} (E(Y^2))^{1/2}$
(Preservation of order) If $\ X \leqslant Y$ then $\ E(X) \leqslant E(Y)$ .
$\ Var(X) = EX^2 -(EX)^2$
If $\ X$ and $\ Y$ are independent, then

$\begin{array}{rcl} E[XY] &=& E[X]E[Y] \\ Var(X+Y) &=& Var(X) + Var(Y) \\ Var(X-Y) &=& Var(X) + Var(Y) \end{array}$

Proof

$\begin{array}{rlr} Var(X+Y) &= E(X+Y)^2 - (E(X+Y))^2 \\ &= (EX^2+EY^2 + 2E(XY)) - ((EX)^2 + (EY)^2 + 2EXEY) \\ &= (EX^2+EY^2 + 2EXEY) - ((EX)^2 + (EY)^2 + 2EXEY) \\ &= (EX^2 - (EX)^2) + (EY^2 - (EY)^2) \\ &= Var(X) + Var(Y). & \square \end{array}$

Distribution of function of random variable

Suppose that a random variable $\ Y$ is a function of a random variable X i.e. $\ Y = f(x)$ .

Then the PDF of $\ Y$ can be determined by the PDF of X.

Examples

Given a random variable $\ Y = X^2$ . If $\ X$ has a PDF $\ p_X(x)$ , we need to find $\ p_Y(y)$ .

Denote the distribution function of $\ X$ and $\ Y$ by $\ F_X(x)$ and $\ F_Y(y)$ , respectively.

For $\ y > 0$ , $\ F_Y(y) = P(Y \leqslant y) = P(X^2 \leqslant y) = P(-\sqrt{y} \leqslant \sqrt{y}) = F_X(\sqrt{y}) - F_X(-\sqrt{y})$

by $\ p_Y(y) = \displaystyle\frac{dF_Y(y)}{dy}$ , it followed that

$p_Y(y) = \begin{dcases} \frac{1}{2\sqrt{y}} \big( p_X(\sqrt{y}) - p_X(-\sqrt{y}) \big) & y > 0 \\ 0 & y \leqslant 0 \,. \end{dcases}$

When $\ X$ is a Gaussian random variable $\ N(0, 1)$ , its PDF is $\ p_X(x) = \displaystyle\frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}}$ ,

then $\ Y$ has a PDF

$p_Y(y) = \begin{dcases} \frac{1}{\sqrt{2\pi}} y^{-\frac{1}{2} } e^{-\frac{y}{2}} & y > 0 \\ 0 & y \leqslant 0 \,. \end{dcases}$

This is just $\ \chi^2$ distribution with one degree of freedom.

Theorem

Let a random variable $\ X$ have the PDF $\ p_X(x)$ and let $\ g(x)$ be a differentiable function on $\ \mathbb{R}$ . If $\ g'(x) > 0\,, \ (x \in \mathbb{R})$ and $\ a = \lim\limits_{x \to -\infty} g(x)\,,\ b = \lim\limits_{x \to +\infty} g(x)$ then $\ Y = g(X)$ has a PDF

$p_Y(y) = \begin{dcases} p_X\big(h(y)\big) h'(y) & a < y < b \\ 0 & otherwise \end{dcases} \quad , h = g^{-1}$

Proof

$\ g(x)$ is monotone function on $\ \mathbb{R}$ and $\ a < g(x) <b$ , and then its reverse function exists $\ \big( y = g(x)\,,\ x = h(y) =g^{-1}(x)\big)$

$\begin{array}{rll} F_Y(y) &= P(g(X) \leqslant y) = 1\,,& (y > b) \\ F_Y(y) &= P(g(X) \leqslant y) = 0\,,& (y < a) \\ F_Y(y) &= P(g(X) \leqslant y) \\ &= P(X \leqslant g^{-1}(y) ) \\ &= P(X \leqslant h(y)) \\ &= F_X\big(h(y)\big)\,,& (a < y < b) \end{array}$

Furthermore, $\ p_Y(y) = \displaystyle\frac{dF_Y(y)}{dy} \Rightarrow p_Y(y) = p_X(h(y))h'(y)$

Characteristic function

The characteristic function of a random variable is defined by

$\varphi_X(t) = E(e^{itx}) = \int_{\mathbb{R}}e^{itx}\,dF_X(x) \left( = \int_{\mathbb{R}}e^{itx}p_X(x)\,dx\right)$

Example

Let $\ X$ be an exponential random variable with PDF

$p(x) = \begin{dcases} \lambda e^{-\lambda x} &,\ x \geqslant 0 \\ 0 &,\ x < 0 \end{dcases}$

Its characteristic function

$\Phi_X(t) = \int_{\mathbb{R}}p(x)e^{itx}\,dx = \lambda\int_0^{\infty}e^{-(\lambda-it)x}\,dx = \frac{\lambda}{\lambda-it}\,.$

Example

Let $\ X$ be a normal random variable with mean $\ \mu$ and variance $\ \sigma^2$ .

Then its characteristic function $\ \Phi_X(t) = e^{i\mu t - \frac{1}{2}t^2\sigma^2}$

Theorem

Let $\ X$ and $\ Y$ be two independent random variables with PDF $\ p(x)$ , $\ q(y)$ . Let $\ Z = X + Y$ , then

$\ \Phi_Z(t) = \Phi_X(t) \Phi_Y(t)$
$\ r(z) = (p*q)(z)$ , $r(z)$ is the PDF of $Z$

Property

Let $\ W = cX$ , $\ c$ is a constant, then

$P(a<W<b) = P\left(\frac{a}{c} < X < \frac{b}{c}\right) = \int_{\frac{a}{c}}^{\frac{b}{c}} p(x)\,dx = \int_a^b\frac{1}{c}p(\frac{x}{c})\,dx$

This implies that $\ W$ has a PDF $\ \beta(x) = \displaystyle\frac{1}{c}p(\frac{x}{c})$

$\Phi_W(t) = E(e^{itW}) = \int_{\mathbb{R}} e^{itx} \frac{1}{c}p(\frac{x}{c})\,dx = \int_{\mathbb{R}} e^{ictx}p(x)\,dx = \Phi_X(ct)$

Theorem

Let the random variable $\ X_1$ be $\ N(\mu_1, \sigma_1^2)$ and $\ X_2$ be $\ N(\mu_2, \sigma_2^2)$ .

If $\ X_1$ and $\ X_2$ are independent, then the random variable $\ X = X_1 + X_2$ is $\ N(\mu, \sigma^2)$

$\mu = \mu_1 + \mu_2\,,\quad \sigma^2 = \sigma_1^2 + \sigma_2^2$

Proof

$\Phi_X(t) = \Phi_{X_1}(t)\Phi_{X_2}(t) \\ \Phi_{X_1}(t) = e^{i\mu_1 t - t^2\sigma_1^2/2} \\ \Phi_{X_2}(t) = e^{i\mu_2 t - t^2\sigma_2^2/2} \\ \Phi_X(t) = e^{i(\mu_1+\mu_2)t - t^2(\sigma_1^2 + \sigma_2^2)/2}$

This implies that $\ X \sim N(\mu_1 + \mu_2, \sigma_1^2+\sigma_2^2)$

Corollary

If $\ X_1, \cdots , X_n$ are independent Gaussian random variables then any linear combination $\ a_1X_1+\cdots+a_nX_n$ is a Gaussian random variable.

Jointly distributed random variables

The joint distribution function of two random variables is defined by
$F(x,y) = P(X \leqslant x, Y \leqslant y)$

Marginal distribution functions

$F_X(x) = F(x,+\infty) \\ F_Y(y) = F(+\infty,y)$

if $\ X$ and $\ Y$ is independent
$F(x,y) = F_X(x)F_Y(y)$

Let $\ X$ and $\ Y$ be discrete random variables with $\ P(X = k) = p(k)$ , $\ P(Y = k) = g(k)$ . The joint probability mass function(PMF) of $\ X$ and $\ Y$ is defined by

$\gamma(k,l) = P(X = k, Y = l)$

Property

$\sum\limits_l\gamma(k,l) = p(k) \\ \sum\limits_k\gamma(k,l) = g(l)$

In fact,

$\ \sum\limits_l\gamma(k,l) = \sum\limits_lP(X = k, Y = l) = P(X = k, Y \in \mathbb{Z}) = p(k)$

If $\ X$ and $\ Y$ are independent then

$P(X = k, Y = l) = P(X = k)P(Y = l) \\ i.e.\qquad \gamma(k,l) = p(k)g(l)$

Consider two continuous random variables $\ X$ and $\ Y$ with PDF $\ p(x)$ , $\ g(y)$ .

If a non-negative intergrable function $\ \gamma(x,y)$ exists such that $\ F(x,y) = \displaystyle\int_{-\infty}{x}\int_{-\infty}{y}\gamma(x,y)\,dxdy$

then $\ \gamma(x,y)$ is called the joint PDF of $\ X$ and $\ Y$

Given $\ Y=y$ , the conditional PDF of $\ X$ is

$\gamma(x|y) = \frac{\gamma(x,y)}{g(y)}$

Let $\ u(X)$ be a function of $\ X$

Given $\ Y=y$ , the conditional expectation of $\ u(X)$ is defined as

$E[u(X)|y] = \int_{\mathbb{R}}u(x)\gamma(x|y)\,dx$

Especially,

$E[X|y] = \int_{\mathbb{R}}x\gamma(x|y)\,dx$

Property

Let $\ X$ and $\ Y$ have joint PDF $\ \gamma(x,y)$ , then $\ p(x) = \int_{\mathbb{R}} \gamma(x,y)\,dy$ , $\ g(y) = \int_{\mathbb{R}} \gamma(x,y)\,dx$ .

If $\ X$ and $\ Y$ are independent, then their joint PDF $\ \gamma(x,y) = p(x)g(y)$

The covariance of two random variables $\ Cov(X,Y) = E[(X-EX)(Y-EY)] = E(XY) - EXEY$

Property

Let $\ X,Y,Z$ be random variables and $\ c,d$ be constants, then

$Cov(c,X) = 0 \\ Cov(cX+dY,Z) = cCov(X,Z)+dCov(Y,Z)$

If $\ X$ and $\ Y$ are independent, $\ Cov(X,Y) = 0$ .

Theorem

Let two random variables $\ X$ and $\ Y$ have the joint density function $\ p_{X_1X_2}(x_1,x_2)$ .

Denote $\ A = \{(x_1,x_2)\in\mathbb{R}^2 | p(x_1, x_2) \neq 0 \}$

Two bivariate differentiable functions $\ g_1(x_1, x_2), g_2(x_1,x_2)$ are such that

$U: y_1 = g_1(x_1, x_2)\,,\quad y_2 = g_2(x_1,x_2)$

Denote the inverse transform $\ U^{-1}: x_1 = h_1(y_1,y_2)\,,\quad x_2 = h_2(y_1,y_2)$

Then $\ Y_1 = g_1(X_1,X_2)\,,\quad Y_2 = g_2(X_1,X_2)$ have the joint PDF

$p_{Y_1Y_2}(y_1,y_2) = \begin{dcases} p_{X_1X_2}(h_1(y_1,y_2), h_2(y_1,y_2))|J(y_1,y_2)| & (y_1,y_2)\in B \\ 0 & otherwise \end{dcases}$

The joint distribution of random variables $\ X_1, \cdots, X_n$ is defined as $\ F(x_1, \cdots, x_n) = P(X_1 \leqslant x_1, \cdots, X_n \leqslant x_n)$

If each $\ X_k$ is discrete, then their joint PMF $\ p(k_1,\cdots, k_m) = P(X_1 = k_1, \cdots, X_n = k_n)$

If each $\ X_k$ is continuous, then there exists $\ p(x_1, \cdots, x_n)$ such that $\ F(x_1,\cdots, x_n) = \int_{-\infty}^{x_1}\cdots \int_{-\infty}^{x_n}p(x_1,\cdots , x_n)\, dx_1\cdots dx_n$ .

$\ p(x_1,\cdots,x_n)$ is the joint PDF of $\ X_1, \cdots, X_n$

Suppose that $\ X_1, \cdots, X_n$ have joint PDF $\ \gamma(x_1,\cdots,x_n)$

the conditional PDF $\ \gamma(x_1, \cdots, x_m | x_{m+1},\cdots, x_n) = \displaystyle\frac{\gamma(x_1, \cdots, x_n)}{\gamma(x_{m+1}, \cdots, x_n)}$

Take the transform $\ Y_1 = g_1(X_1,\cdots,X_n)\ \cdots \ Y_2 = g_2(X_1,\cdots,X_n)$ . The inverse transform $\ X_1 = h_1(Y_1,\cdots,Y_n)\ \cdots\ X_n = h_n(Y_1,\cdots,Y_n)$

The joint PDF of $\ Y_1, \cdots, Y_n$ is $\ p_Y(y_1, \cdots, y_n) = p_X(h_1(y_1,\cdots,y_n), \cdots, h_n(y_1,\cdots,y_n)) |J(y_1, \cdots, y_n)|$

Central Limit Theorem

Defination

Let $\ \{X_n\}_{n\in\mathbb{Z}^{+}}$ be a sequence of random variables and $\ X$ be a random variable.

$\ \{X_n\}_{n\in\mathbb{Z}^{+}}$ converges to $\ X$ in probability if
$\forall \varepsilon > 0\,,\ \lim\limits_{n\rightarrow + \infty}P(\lvert X_n - X \rvert \geqslant \varepsilon) = 0$
Denote $\ X_n\stackrel{p}\longrightarrow X$
$\ \{X_n\}_{n\in\mathbb{Z}^{+}}$ converges to $\ X$ in mean square sense if

$E[X_n^2] < +\infty\ \text{and}\ \lim\limits_{n\rightarrow + \infty} E[\lvert X_n - X \rvert^2] = 0$

Denote $\ X_n\stackrel{m.s.}\longrightarrow X$

$\ \{X_n\}_{n\in\mathbb{Z}^{+}}$ converges to $\ X$ in distribution if
$\forall x \,,\ \lim\limits_{n\rightarrow + \infty}F_{X_n}(x) = F_X(x)$
Denote $\ X_n\stackrel{d}\longrightarrow X$

Property

If $\ X_n \stackrel{m.s.}\longrightarrow X$ then $\ X_n \stackrel{p}\longrightarrow X$

If $\ X_n \stackrel{p}\longrightarrow X$ then $\ X_n \stackrel{d}\longrightarrow X$

Property

A sequence $\ \{X_n\}_{n \in \mathbb{Z}^+}$ of random variables converges to a random variable $\ X$ in distribution if and only if their characteristic functions satisfy $\ \Phi_{X_n}(t) \rightarrow \Phi_{X}(t)$

Theorem

Suppose that $\ \{X_n\}_{n \in \mathbb{Z}^+}$ is a sequence of independent and identically distributed( $i.i.d$ ) random variables and each $\ X_n$ has the expectation $\ \mu$ and variance $\ \sigma^2$ .

Let $\ S_n = \sum\limits_{k = 1}^{n} X_k$ , then the sequence of random variables $\ \displaystyle\frac{S_n - n\mu}{ \sqrt{n}}$ converges to a Gaussian random variable $\ X \sim N(0,\sigma^2)$