⎨ ⎧ Ω ∈ F , i f A ∈ F , t h e n Ω − A ∈ F , i f A 1 , A 2 , ⋯ ∈ F , t h e n i ∈ Z + ⋃ A i ∈ F . The component P is a measure on F \ \mathcal{F} F , satisfying
{ P ( A ) ⩾ 0 , ∀ A ∈ F , P ( Ω ) = 1 , i f A 1 , ⋯ , A k , ⋯ ∈ F , A i ∩ A j = ∅ , t h e n P ( ⋃ k A k ) = ∑ k P ( A k ) . \left\{ \begin{array}{l} P(A) \geqslant 0, \ \forall A \in \mathcal{F}\,, \\ P(\Omega) = 1\,, \\ i\!f\ A_1, \cdots, A_k, \cdots \in \mathcal{F}, \ A_i \cap A_j = \emptyset, \ then\ P\Big(\bigcup\limits_k A_k\Big) = \sum\limits_k P(A_k)\,. \end{array} \right. ⎩ ⎨ ⎧ P ( A ) ⩾ 0 , ∀ A ∈ F , P ( Ω ) = 1 , i f A 1 , ⋯ , A k , ⋯ ∈ F , A i ∩ A j = ∅ , t h e n P ( k ⋃ A k ) = k ∑ P ( A k ) .
Examples Ω = { 1 , 2 , ⋯ , N } \ \Omega = \{1, 2, \cdots, N\} Ω = { 1 , 2 , ⋯ , N } , F = { a l l s u b s e t s o f Ω } \ \mathcal{F} = \{all\ subsets\ of\ \Omega\} F = { a ll s u b se t s o f Ω }
P ( A ) = 1 N ( c a r d i n a l i t y o f A ) \ P(A) = \displaystyle\frac{1}{N}(cardinality\ of\ A) P ( A ) = N 1 ( c a r d ina l i t y o f A )
Ω \ \Omega Ω is the interior of the circle with center ( 0 , 0 ) \ (0,0) ( 0 , 0 ) and radius 1 \ 1 1 ,
F = { a l l m e a s u r a b l e s u b s e t s o f Ω } \ \mathcal{F} = \{all\ measurable\ subsets\ of\ \Omega\} F = { a ll m e a s u r ab l e s u b se t s o f Ω }
P ( A ) = m e a s u r e o f A π \ P(A) = \displaystyle\frac{measure\ of \ A}{\pi} P ( A ) = π m e a s u re o f A
Random Variable Let a probabilty space ( Ω , F , P ) \ (\Omega, \mathcal{F}, P) ( Ω , F , P ) be given.
A random variable is a function X \ X X from Ω \ \Omega Ω to the real axis R \ \mathbb{R} R that satisfies { ω ∈ Ω , X ( ω ) ⩽ C } ∈ F \ \{ \omega \in \Omega, X(\omega) \leqslant C \} \in \mathcal{F} { ω ∈ Ω , X ( ω ) ⩽ C } ∈ F .
The cumulative distribution function(CDF) is defined by F ( x ) = P ( X ⩽ x ) \ F(x) = P(X \leqslant x) F ( x ) = P ( X ⩽ x )
It satisfies the following properties:
F \ F F is nondecreasing
F ( x ) → 1 ( x → + ∞ ) \ F(x) \to 1 \ (x \to +\infty) F ( x ) → 1 ( x → + ∞ )
F ( x ) → 0 ( x → − ∞ ) \ F(x) \to 0 \ (x \to -\infty) F ( x ) → 0 ( x → − ∞ )
F ( x ) \ F(x) F ( x ) is right continuous
Discrete Random Variables An integer-valued random variables called a discrete random variable.
Its cumulative distribution function is
F ( x ) = P ( X ⩽ x ) = ∑ k ⩽ x p ( k ) . F(x) = P(X \leqslant x) = \sum\limits_{k \leqslant x} p(k)\,. F ( x ) = P ( X ⩽ x ) = k ⩽ x ∑ p ( k ) .
where x ∈ R \ x \in \mathbb{R} x ∈ R , p ( k ) = P ( X = k ) . \ p(k) = P(X = k)\,. p ( k ) = P ( X = k ) .
{ p ( k ) } \ \{p(k)\} { p ( k )} is called the probability mass function(PMF) . It is clear that P ( k ) ⩾ 0 \ P(k) \geqslant 0 P ( k ) ⩾ 0 , ∑ k p ( k ) = 1 \ \sum\limits_{k} p(k) = 1 k ∑ p ( k ) = 1 .
Its expectation(mean) and variance
E ( X ) = ∑ k k p ( k ) , V a r ( X ) = ∑ k ( k − μ ) 2 p ( k ) , μ = E ( x ) . \ E(X) = \displaystyle\sum\limits_k k p(k)\,,\quad Var(X) = \displaystyle\sum\limits_k (k-\mu)^2 p(k),\ \mu = E(x)\,. E ( X ) = k ∑ k p ( k ) , Va r ( X ) = k ∑ ( k − μ ) 2 p ( k ) , μ = E ( x ) .
Examples Bernoulli random variable p ( k ) = { n ! n ! ( n − k ) ! ( 1 − r ) n − k r k k = 0 , 1 , ⋯ , n 0 o t h e r w i s e p(k) = \begin{dcases} \frac{n!}{n!(n-k)!}(1-r)^{n-k}r^k & k = 0, 1, \cdots, n \\ 0 & otherwise \end{dcases} p ( k ) = ⎩ ⎨ ⎧ n ! ( n − k )! n ! ( 1 − r ) n − k r k 0 k = 0 , 1 , ⋯ , n o t h er w i se
E ( X ) = n r , V a r ( X ) = n r ( 1 − r ) . E(X) = nr\,,\quad Var(X) = nr(1-r)\,. E ( X ) = n r , Va r ( X ) = n r ( 1 − r ) .
Poisson random variable p ( k ) = { λ k k ! e − λ k ⩾ 0 0 o t h e r w i s e p(k) = \begin{dcases} \frac{\lambda^k}{k!}e^{-\lambda} & k \geqslant 0 \\ 0 & otherwise \end{dcases} p ( k ) = ⎩ ⎨ ⎧ k ! λ k e − λ 0 k ⩾ 0 o t h er w i se
E ( X ) = λ , V a r ( X ) = λ . E(X) = \lambda \,,\quad Var(X) = \lambda \,. E ( X ) = λ , Va r ( X ) = λ .
Continuous Random Variables A real-valued random variable is called a continuous random variable.
Its cumulative distribution function(CDF) is F ( x ) = P ( X ⩽ x ) \ F(x) = P(X \leqslant x) F ( x ) = P ( X ⩽ x )
If there exists a non-negative integrable function p ( x ) \ p(x) p ( x ) such that F ( x ) = ∫ − ∞ x p ( t ) d t \ F(x) = \displaystyle\int_{-\infty}^x p(t)\,dt F ( x ) = ∫ − ∞ x p ( t ) d t , then p ( x ) \ p(x) p ( x ) is called the probability density function(PDF)
If p ( x ) \ p(x) p ( x ) is continuous then d p d x = p ( x ) \ \displaystyle\frac{dp}{dx}=p(x) d x d p = p ( x ) .
p ( x ) \ p(x) p ( x ) satisfies
p ( x ) ⩾ 0 \ p(x)\geqslant 0 p ( x ) ⩾ 0 ∫ R p ( x ) d x = 1 , P ( x 1 < X ⩽ x 2 ) = ∫ x 1 x 2 p ( t ) d t . \ \displaystyle\int_{\mathbb{R}}p(x)\,dx = 1\,,\ P(x_1<X\leqslant x_2) = \int_{x_1}^{x_2}\!p(t)\,dt\,. ∫ R p ( x ) d x = 1 , P ( x 1 < X ⩽ x 2 ) = ∫ x 1 x 2 p ( t ) d t . Its expectation(mean) E ( X ) = ∫ R x p ( x ) d x \ E(X) = \displaystyle\int_{\mathbb{R}}xp(x)\,dx E ( X ) = ∫ R x p ( x ) d x
variance V a r ( X ) = ∫ R ( x − μ ) 2 p ( x ) d x , ( μ = E ( X ) ) \ Var(X) = \displaystyle\int_{\mathbb{R}}(x-\mu)^2p(x)\,dx\,,\quad (\mu = E(X)) Va r ( X ) = ∫ R ( x − μ ) 2 p ( x ) d x , ( μ = E ( X ))
Examples p ( x ) = { 1 b − a a ⩽ x ⩽ b 0 o t h e r w i s e p(x) = \begin{dcases} \frac{1}{b-a} & a\leqslant x \leqslant b \\ 0 & otherwise \end{dcases} p ( x ) = ⎩ ⎨ ⎧ b − a 1 0 a ⩽ x ⩽ b o t h er w i se
E ( X ) = a + b 2 , V a r ( X ) = 1 12 ( b − a ) 2 . E(X) = \frac{a+b}{2}\,,\quad Var(X) = \frac{1}{12}(b-a)^2\,. E ( X ) = 2 a + b , Va r ( X ) = 12 1 ( b − a ) 2 .
Gaussian random variable p ( x ) = 1 2 π σ exp ( − ( x − μ ) 2 2 σ 2 ) p(x) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) p ( x ) = 2 π σ 1 exp ( − 2 σ 2 ( x − μ ) 2 )
E ( X ) = μ , V a r ( X ) = σ 2 . E(X) = \mu \,,\quad Var(X) = \sigma^2\,. E ( X ) = μ , Va r ( X ) = σ 2 .
It is also called the Normal random variable. Denote it by N ( μ , σ 2 ) \ N(\mu, \sigma^2) N ( μ , σ 2 )
Gamma random variable p ( x ) = 1 Γ ( x ) β α x α − 1 e − x / β , ( 0 < x < ∞ , α > 0 , β > 0 ) p(x) = \frac{1}{\Gamma(x) \beta^\alpha} x^{\alpha - 1}e^{-x/\beta}\,,\quad (0<x<\infty,\ \alpha>0,\ \beta>0) p ( x ) = Γ ( x ) β α 1 x α − 1 e − x / β , ( 0 < x < ∞ , α > 0 , β > 0 )
E ( X ) = α β , V a r ( X ) = α β 2 . E(X) = \alpha \beta \,,\quad Var(X) = \alpha \beta^2\,. E ( X ) = α β , Va r ( X ) = α β 2 .
Properties of Expectation and Variance The expectation of a function f ( X ) \ f(X) f ( X ) with respect to a random variable X \ X X is
E ( f ( X ) ) = ∑ f ( k ) p ( k ) , d i s c r e t e E ( f ( X ) ) = ∫ R f ( x ) p ( x ) d x , c o n t i n u o u s \begin{array}{ll} E(f(X)) = \displaystyle\sum f(k) p(k)\,, & discrete \\ \\ E(f(X)) = \displaystyle\int_{\mathbb{R}} f(x) p(x)\,dx\,, & continuous \end{array} E ( f ( X )) = ∑ f ( k ) p ( k ) , E ( f ( X )) = ∫ R f ( x ) p ( x ) d x , d i scre t e co n t in u o u s
If for arbitrary constants a and b, P ( X ⩽ a , Y ⩽ b ) = P ( X ⩽ a ) P ( X ⩽ b ) \ P(X \leqslant a, Y \leqslant b) = P(X \leqslant a) P(X \leqslant b) P ( X ⩽ a , Y ⩽ b ) = P ( X ⩽ a ) P ( X ⩽ b ) , then the random variable are called independent .
Property Let X \ X X and Y \ Y Y be random variables and c \ c c and d \ d d be constants. Then,
(Linearity) E [ c X + d Y ] = c E ( X ) + d E ( Y ) \ E[cX+dY] = cE(X)+dE(Y) E [ c X + d Y ] = c E ( X ) + d E ( Y ) (Schwarz inequality) E ( X Y ) ⩽ ( E ( X 2 ) ) 1 / 2 ( E ( Y 2 ) ) 1 / 2 \ E(XY) \leqslant (E(X^2))^{1/2} (E(Y^2))^{1/2} E ( X Y ) ⩽ ( E ( X 2 ) ) 1/2 ( E ( Y 2 ) ) 1/2 (Preservation of order) If X ⩽ Y \ X \leqslant Y X ⩽ Y then E ( X ) ⩽ E ( Y ) \ E(X) \leqslant E(Y) E ( X ) ⩽ E ( Y ) . V a r ( X ) = E X 2 − ( E X ) 2 \ Var(X) = EX^2 -(EX)^2 Va r ( X ) = E X 2 − ( EX ) 2 If X \ X X and Y \ Y Y are independent, then E [ X Y ] = E [ X ] E [ Y ] V a r ( X + Y ) = V a r ( X ) + V a r ( Y ) V a r ( X − Y ) = V a r ( X ) + V a r ( Y ) \begin{array}{rcl} E[XY] &=& E[X]E[Y] \\ Var(X+Y) &=& Var(X) + Var(Y) \\ Var(X-Y) &=& Var(X) + Var(Y) \end{array} E [ X Y ] Va r ( X + Y ) Va r ( X − Y ) = = = E [ X ] E [ Y ] Va r ( X ) + Va r ( Y ) Va r ( X ) + Va r ( Y )
Proof V a r ( X + Y ) = E ( X + Y ) 2 − ( E ( X + Y ) ) 2 = ( E X 2 + E Y 2 + 2 E ( X Y ) ) − ( ( E X ) 2 + ( E Y ) 2 + 2 E X E Y ) = ( E X 2 + E Y 2 + 2 E X E Y ) − ( ( E X ) 2 + ( E Y ) 2 + 2 E X E Y ) = ( E X 2 − ( E X ) 2 ) + ( E Y 2 − ( E Y ) 2 ) = V a r ( X ) + V a r ( Y ) . □ \begin{array}{rlr} Var(X+Y) &= E(X+Y)^2 - (E(X+Y))^2 \\ &= (EX^2+EY^2 + 2E(XY)) - ((EX)^2 + (EY)^2 + 2EXEY) \\ &= (EX^2+EY^2 + 2EXEY) - ((EX)^2 + (EY)^2 + 2EXEY) \\ &= (EX^2 - (EX)^2) + (EY^2 - (EY)^2) \\ &= Var(X) + Var(Y). & \square \end{array} Va r ( X + Y ) = E ( X + Y ) 2 − ( E ( X + Y ) ) 2 = ( E X 2 + E Y 2 + 2 E ( X Y )) − (( EX ) 2 + ( E Y ) 2 + 2 EXE Y ) = ( E X 2 + E Y 2 + 2 EXE Y ) − (( EX ) 2 + ( E Y ) 2 + 2 EXE Y ) = ( E X 2 − ( EX ) 2 ) + ( E Y 2 − ( E Y ) 2 ) = Va r ( X ) + Va r ( Y ) . □
Distribution of function of random variable Suppose that a random variable Y \ Y Y is a function of a random variable X i.e. Y = f ( x ) \ Y = f(x) Y = f ( x ) .
Then the PDF of Y \ Y Y can be determined by the PDF of X.
Examples Given a random variable Y = X 2 \ Y = X^2 Y = X 2 . If X \ X X has a PDF p X ( x ) \ p_X(x) p X ( x ) , we need to find p Y ( y ) \ p_Y(y) p Y ( y ) .
Denote the distribution function of X \ X X and Y \ Y Y by F X ( x ) \ F_X(x) F X ( x ) and F Y ( y ) \ F_Y(y) F Y ( y ) , respectively.
For y > 0 \ y > 0 y > 0 , F Y ( y ) = P ( Y ⩽ y ) = P ( X 2 ⩽ y ) = P ( − y ⩽ y ) = F X ( y ) − F X ( − y ) \ F_Y(y) = P(Y \leqslant y) = P(X^2 \leqslant y) = P(-\sqrt{y} \leqslant \sqrt{y}) = F_X(\sqrt{y}) - F_X(-\sqrt{y}) F Y ( y ) = P ( Y ⩽ y ) = P ( X 2 ⩽ y ) = P ( − y ⩽ y ) = F X ( y ) − F X ( − y )
by p Y ( y ) = d F Y ( y ) d y \ p_Y(y) = \displaystyle\frac{dF_Y(y)}{dy} p Y ( y ) = d y d F Y ( y ) , it followed that
p Y ( y ) = { 1 2 y ( p X ( y ) − p X ( − y ) ) y > 0 0 y ⩽ 0 . p_Y(y) = \begin{dcases} \frac{1}{2\sqrt{y}} \big( p_X(\sqrt{y}) - p_X(-\sqrt{y}) \big) & y > 0 \\ 0 & y \leqslant 0 \,. \end{dcases} p Y ( y ) = ⎩ ⎨ ⎧ 2 y 1 ( p X ( y ) − p X ( − y ) ) 0 y > 0 y ⩽ 0 .
When X \ X X is a Gaussian random variable N ( 0 , 1 ) \ N(0, 1) N ( 0 , 1 ) , its PDF is p X ( x ) = 1 2 π e − x 2 2 \ p_X(x) = \displaystyle\frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} p X ( x ) = 2 π 1 e − 2 x 2 ,
then Y \ Y Y has a PDF
p Y ( y ) = { 1 2 π y − 1 2 e − y 2 y > 0 0 y ⩽ 0 . p_Y(y) = \begin{dcases} \frac{1}{\sqrt{2\pi}} y^{-\frac{1}{2} } e^{-\frac{y}{2}} & y > 0 \\ 0 & y \leqslant 0 \,. \end{dcases} p Y ( y ) = ⎩ ⎨ ⎧ 2 π 1 y − 2 1 e − 2 y 0 y > 0 y ⩽ 0 .
This is just χ 2 \ \chi^2 χ 2 distribution with one degree of freedom.
Theorem Let a random variable X \ X X have the PDF p X ( x ) \ p_X(x) p X ( x ) and let g ( x ) \ g(x) g ( x ) be a differentiable function on R \ \mathbb{R} R . If g ′ ( x ) > 0 , ( x ∈ R ) \ g'(x) > 0\,, \ (x \in \mathbb{R}) g ′ ( x ) > 0 , ( x ∈ R ) and a = lim x → − ∞ g ( x ) , b = lim x → + ∞ g ( x ) \ a = \lim\limits_{x \to -\infty} g(x)\,,\ b = \lim\limits_{x \to +\infty} g(x) a = x → − ∞ lim g ( x ) , b = x → + ∞ lim g ( x ) then Y = g ( X ) \ Y = g(X) Y = g ( X ) has a PDF
p Y ( y ) = { p X ( h ( y ) ) h ′ ( y ) a < y < b 0 o t h e r w i s e , h = g − 1 p_Y(y) = \begin{dcases} p_X\big(h(y)\big) h'(y) & a < y < b \\ 0 & otherwise \end{dcases} \quad , h = g^{-1} p Y ( y ) = { p X ( h ( y ) ) h ′ ( y ) 0 a < y < b o t h er w i se , h = g − 1
Proof g ( x ) \ g(x) g ( x ) is monotone function on R \ \mathbb{R} R and a < g ( x ) < b \ a < g(x) <b a < g ( x ) < b , and then its reverse function exists ( y = g ( x ) , x = h ( y ) = g − 1 ( x ) ) \ \big( y = g(x)\,,\ x = h(y) =g^{-1}(x)\big) ( y = g ( x ) , x = h ( y ) = g − 1 ( x ) )
F Y ( y ) = P ( g ( X ) ⩽ y ) = 1 , ( y > b ) F Y ( y ) = P ( g ( X ) ⩽ y ) = 0 , ( y < a ) F Y ( y ) = P ( g ( X ) ⩽ y ) = P ( X ⩽ g − 1 ( y ) ) = P ( X ⩽ h ( y ) ) = F X ( h ( y ) ) , ( a < y < b ) \begin{array}{rll} F_Y(y) &= P(g(X) \leqslant y) = 1\,,& (y > b) \\ F_Y(y) &= P(g(X) \leqslant y) = 0\,,& (y < a) \\ F_Y(y) &= P(g(X) \leqslant y) \\ &= P(X \leqslant g^{-1}(y) ) \\ &= P(X \leqslant h(y)) \\ &= F_X\big(h(y)\big)\,,& (a < y < b) \end{array} F Y ( y ) F Y ( y ) F Y ( y ) = P ( g ( X ) ⩽ y ) = 1 , = P ( g ( X ) ⩽ y ) = 0 , = P ( g ( X ) ⩽ y ) = P ( X ⩽ g − 1 ( y )) = P ( X ⩽ h ( y )) = F X ( h ( y ) ) , ( y > b ) ( y < a ) ( a < y < b )
Furthermore, p Y ( y ) = d F Y ( y ) d y ⇒ p Y ( y ) = p X ( h ( y ) ) h ′ ( y ) \ p_Y(y) = \displaystyle\frac{dF_Y(y)}{dy} \Rightarrow p_Y(y) = p_X(h(y))h'(y) p Y ( y ) = d y d F Y ( y ) ⇒ p Y ( y ) = p X ( h ( y )) h ′ ( y )
Characteristic function The characteristic function of a random variable is defined by
φ X ( t ) = E ( e i t x ) = ∫ R e i t x d F X ( x ) ( = ∫ R e i t x p X ( x ) d x ) \varphi_X(t) = E(e^{itx}) = \int_{\mathbb{R}}e^{itx}\,dF_X(x) \left( = \int_{\mathbb{R}}e^{itx}p_X(x)\,dx\right) φ X ( t ) = E ( e i t x ) = ∫ R e i t x d F X ( x ) ( = ∫ R e i t x p X ( x ) d x )
Example Let X \ X X be an exponential random variable with PDF
p ( x ) = { λ e − λ x , x ⩾ 0 0 , x < 0 p(x) = \begin{dcases} \lambda e^{-\lambda x} &,\ x \geqslant 0 \\ 0 &,\ x < 0 \end{dcases} p ( x ) = { λ e − λ x 0 , x ⩾ 0 , x < 0
Its characteristic function
Φ X ( t ) = ∫ R p ( x ) e i t x d x = λ ∫ 0 ∞ e − ( λ − i t ) x d x = λ λ − i t . \Phi_X(t) = \int_{\mathbb{R}}p(x)e^{itx}\,dx = \lambda\int_0^{\infty}e^{-(\lambda-it)x}\,dx = \frac{\lambda}{\lambda-it}\,. Φ X ( t ) = ∫ R p ( x ) e i t x d x = λ ∫ 0 ∞ e − ( λ − i t ) x d x = λ − i t λ .
Example Let X \ X X be a normal random variable with mean μ \ \mu μ and variance σ 2 \ \sigma^2 σ 2 .
Then its characteristic function Φ X ( t ) = e i μ t − 1 2 t 2 σ 2 \ \Phi_X(t) = e^{i\mu t - \frac{1}{2}t^2\sigma^2} Φ X ( t ) = e i μ t − 2 1 t 2 σ 2
Theorem Let X \ X X and Y \ Y Y be two independent random variables with PDF p ( x ) \ p(x) p ( x ) , q ( y ) \ q(y) q ( y ) . Let Z = X + Y \ Z = X + Y Z = X + Y , then
Φ Z ( t ) = Φ X ( t ) Φ Y ( t ) \ \Phi_Z(t) = \Phi_X(t) \Phi_Y(t) Φ Z ( t ) = Φ X ( t ) Φ Y ( t ) r ( z ) = ( p ∗ q ) ( z ) \ r(z) = (p*q)(z) r ( z ) = ( p ∗ q ) ( z ) , r ( z ) r(z) r ( z ) is the PDF of Z Z Z Property Let W = c X \ W = cX W = c X , c \ c c is a constant, then
P ( a < W < b ) = P ( a c < X < b c ) = ∫ a c b c p ( x ) d x = ∫ a b 1 c p ( x c ) d x P(a<W<b) = P\left(\frac{a}{c} < X < \frac{b}{c}\right) = \int_{\frac{a}{c}}^{\frac{b}{c}} p(x)\,dx = \int_a^b\frac{1}{c}p(\frac{x}{c})\,dx P ( a < W < b ) = P ( c a < X < c b ) = ∫ c a c b p ( x ) d x = ∫ a b c 1 p ( c x ) d x
This implies that W \ W W has a PDF β ( x ) = 1 c p ( x c ) \ \beta(x) = \displaystyle\frac{1}{c}p(\frac{x}{c}) β ( x ) = c 1 p ( c x )
Φ W ( t ) = E ( e i t W ) = ∫ R e i t x 1 c p ( x c ) d x = ∫ R e i c t x p ( x ) d x = Φ X ( c t ) \Phi_W(t) = E(e^{itW}) = \int_{\mathbb{R}} e^{itx} \frac{1}{c}p(\frac{x}{c})\,dx = \int_{\mathbb{R}} e^{ictx}p(x)\,dx = \Phi_X(ct) Φ W ( t ) = E ( e i t W ) = ∫ R e i t x c 1 p ( c x ) d x = ∫ R e i c t x p ( x ) d x = Φ X ( c t )
Theorem Let the random variable X 1 \ X_1 X 1 be N ( μ 1 , σ 1 2 ) \ N(\mu_1, \sigma_1^2) N ( μ 1 , σ 1 2 ) and X 2 \ X_2 X 2 be N ( μ 2 , σ 2 2 ) \ N(\mu_2, \sigma_2^2) N ( μ 2 , σ 2 2 ) .
If X 1 \ X_1 X 1 and X 2 \ X_2 X 2 are independent, then the random variable X = X 1 + X 2 \ X = X_1 + X_2 X = X 1 + X 2 is N ( μ , σ 2 ) \ N(\mu, \sigma^2) N ( μ , σ 2 )
μ = μ 1 + μ 2 , σ 2 = σ 1 2 + σ 2 2 \mu = \mu_1 + \mu_2\,,\quad \sigma^2 = \sigma_1^2 + \sigma_2^2 μ = μ 1 + μ 2 , σ 2 = σ 1 2 + σ 2 2
Proof Φ X ( t ) = Φ X 1 ( t ) Φ X 2 ( t ) Φ X 1 ( t ) = e i μ 1 t − t 2 σ 1 2 / 2 Φ X 2 ( t ) = e i μ 2 t − t 2 σ 2 2 / 2 Φ X ( t ) = e i ( μ 1 + μ 2 ) t − t 2 ( σ 1 2 + σ 2 2 ) / 2 \Phi_X(t) = \Phi_{X_1}(t)\Phi_{X_2}(t) \\ \Phi_{X_1}(t) = e^{i\mu_1 t - t^2\sigma_1^2/2} \\ \Phi_{X_2}(t) = e^{i\mu_2 t - t^2\sigma_2^2/2} \\ \Phi_X(t) = e^{i(\mu_1+\mu_2)t - t^2(\sigma_1^2 + \sigma_2^2)/2} Φ X ( t ) = Φ X 1 ( t ) Φ X 2 ( t ) Φ X 1 ( t ) = e i μ 1 t − t 2 σ 1 2 /2 Φ X 2 ( t ) = e i μ 2 t − t 2 σ 2 2 /2 Φ X ( t ) = e i ( μ 1 + μ 2 ) t − t 2 ( σ 1 2 + σ 2 2 ) /2
This implies that X ∼ N ( μ 1 + μ 2 , σ 1 2 + σ 2 2 ) \ X \sim N(\mu_1 + \mu_2, \sigma_1^2+\sigma_2^2) X ∼ N ( μ 1 + μ 2 , σ 1 2 + σ 2 2 )
Corollary If X 1 , ⋯ , X n \ X_1, \cdots , X_n X 1 , ⋯ , X n are independent Gaussian random variables then any linear combination a 1 X 1 + ⋯ + a n X n \ a_1X_1+\cdots+a_nX_n a 1 X 1 + ⋯ + a n X n is a Gaussian random variable.
Jointly distributed random variables The joint distribution function of two random variables is defined by
F ( x , y ) = P ( X ⩽ x , Y ⩽ y ) F(x,y) = P(X \leqslant x, Y \leqslant y) F ( x , y ) = P ( X ⩽ x , Y ⩽ y )
Marginal distribution functions F X ( x ) = F ( x , + ∞ ) F Y ( y ) = F ( + ∞ , y ) F_X(x) = F(x,+\infty) \\ F_Y(y) = F(+\infty,y) F X ( x ) = F ( x , + ∞ ) F Y ( y ) = F ( + ∞ , y )
if X \ X X and Y \ Y Y is independent
F ( x , y ) = F X ( x ) F Y ( y ) F(x,y) = F_X(x)F_Y(y) F ( x , y ) = F X ( x ) F Y ( y )
Let X \ X X and Y \ Y Y be discrete random variables with P ( X = k ) = p ( k ) \ P(X = k) = p(k) P ( X = k ) = p ( k ) , P ( Y = k ) = g ( k ) \ P(Y = k) = g(k) P ( Y = k ) = g ( k ) . The joint probability mass function(PMF) of X \ X X and Y \ Y Y is defined by
γ ( k , l ) = P ( X = k , Y = l ) \gamma(k,l) = P(X = k, Y = l) γ ( k , l ) = P ( X = k , Y = l )
Property ∑ l γ ( k , l ) = p ( k ) ∑ k γ ( k , l ) = g ( l ) \sum\limits_l\gamma(k,l) = p(k) \\ \sum\limits_k\gamma(k,l) = g(l) l ∑ γ ( k , l ) = p ( k ) k ∑ γ ( k , l ) = g ( l )
In fact,
∑ l γ ( k , l ) = ∑ l P ( X = k , Y = l ) = P ( X = k , Y ∈ Z ) = p ( k ) \ \sum\limits_l\gamma(k,l) = \sum\limits_lP(X = k, Y = l) = P(X = k, Y \in \mathbb{Z}) = p(k) l ∑ γ ( k , l ) = l ∑ P ( X = k , Y = l ) = P ( X = k , Y ∈ Z ) = p ( k )
If X \ X X and Y \ Y Y are independent then
P ( X = k , Y = l ) = P ( X = k ) P ( Y = l ) i . e . γ ( k , l ) = p ( k ) g ( l ) P(X = k, Y = l) = P(X = k)P(Y = l) \\ i.e.\qquad \gamma(k,l) = p(k)g(l) P ( X = k , Y = l ) = P ( X = k ) P ( Y = l ) i . e . γ ( k , l ) = p ( k ) g ( l )
Consider two continuous random variables X \ X X and Y \ Y Y with PDF p ( x ) \ p(x) p ( x ) , g ( y ) \ g(y) g ( y ) .
If a non-negative intergrable function γ ( x , y ) \ \gamma(x,y) γ ( x , y ) exists such that F ( x , y ) = ∫ − ∞ x ∫ − ∞ y γ ( x , y ) d x d y \ F(x,y) = \displaystyle\int_{-\infty}{x}\int_{-\infty}{y}\gamma(x,y)\,dxdy F ( x , y ) = ∫ − ∞ x ∫ − ∞ y γ ( x , y ) d x d y
then γ ( x , y ) \ \gamma(x,y) γ ( x , y ) is called the joint PDF of X \ X X and Y \ Y Y
Given Y = y \ Y=y Y = y , the conditional PDF of X \ X X is
γ ( x ∣ y ) = γ ( x , y ) g ( y ) \gamma(x|y) = \frac{\gamma(x,y)}{g(y)} γ ( x ∣ y ) = g ( y ) γ ( x , y )
Let u ( X ) \ u(X) u ( X ) be a function of X \ X X
Given Y = y \ Y=y Y = y , the conditional expectation of u ( X ) \ u(X) u ( X ) is defined as
E [ u ( X ) ∣ y ] = ∫ R u ( x ) γ ( x ∣ y ) d x E[u(X)|y] = \int_{\mathbb{R}}u(x)\gamma(x|y)\,dx E [ u ( X ) ∣ y ] = ∫ R u ( x ) γ ( x ∣ y ) d x
Especially,
E [ X ∣ y ] = ∫ R x γ ( x ∣ y ) d x E[X|y] = \int_{\mathbb{R}}x\gamma(x|y)\,dx E [ X ∣ y ] = ∫ R x γ ( x ∣ y ) d x
Property Let X \ X X and Y \ Y Y have joint PDF γ ( x , y ) \ \gamma(x,y) γ ( x , y ) , then p ( x ) = ∫ R γ ( x , y ) d y \ p(x) = \int_{\mathbb{R}} \gamma(x,y)\,dy p ( x ) = ∫ R γ ( x , y ) d y , g ( y ) = ∫ R γ ( x , y ) d x \ g(y) = \int_{\mathbb{R}} \gamma(x,y)\,dx g ( y ) = ∫ R γ ( x , y ) d x .
If X \ X X and Y \ Y Y are independent, then their joint PDF γ ( x , y ) = p ( x ) g ( y ) \ \gamma(x,y) = p(x)g(y) γ ( x , y ) = p ( x ) g ( y )
The covariance of two random variables C o v ( X , Y ) = E [ ( X − E X ) ( Y − E Y ) ] = E ( X Y ) − E X E Y \ Cov(X,Y) = E[(X-EX)(Y-EY)] = E(XY) - EXEY C o v ( X , Y ) = E [( X − EX ) ( Y − E Y )] = E ( X Y ) − EXE Y
Property Let X , Y , Z \ X,Y,Z X , Y , Z be random variables and c , d \ c,d c , d be constants, then
C o v ( c , X ) = 0 C o v ( c X + d Y , Z ) = c C o v ( X , Z ) + d C o v ( Y , Z ) Cov(c,X) = 0 \\ Cov(cX+dY,Z) = cCov(X,Z)+dCov(Y,Z) C o v ( c , X ) = 0 C o v ( c X + d Y , Z ) = c C o v ( X , Z ) + d C o v ( Y , Z )
If X \ X X and Y \ Y Y are independent, C o v ( X , Y ) = 0 \ Cov(X,Y) = 0 C o v ( X , Y ) = 0 .
Theorem Let two random variables X \ X X and Y \ Y Y have the joint density function p X 1 X 2 ( x 1 , x 2 ) \ p_{X_1X_2}(x_1,x_2) p X 1 X 2 ( x 1 , x 2 ) .
Denote A = { ( x 1 , x 2 ) ∈ R 2 ∣ p ( x 1 , x 2 ) ≠ 0 } \ A = \{(x_1,x_2)\in\mathbb{R}^2 | p(x_1, x_2) \neq 0 \} A = {( x 1 , x 2 ) ∈ R 2 ∣ p ( x 1 , x 2 ) = 0 }
Two bivariate differentiable functions g 1 ( x 1 , x 2 ) , g 2 ( x 1 , x 2 ) \ g_1(x_1, x_2), g_2(x_1,x_2) g 1 ( x 1 , x 2 ) , g 2 ( x 1 , x 2 ) are such that
U : y 1 = g 1 ( x 1 , x 2 ) , y 2 = g 2 ( x 1 , x 2 ) U: y_1 = g_1(x_1, x_2)\,,\quad y_2 = g_2(x_1,x_2) U : y 1 = g 1 ( x 1 , x 2 ) , y 2 = g 2 ( x 1 , x 2 )
Denote the inverse transform U − 1 : x 1 = h 1 ( y 1 , y 2 ) , x 2 = h 2 ( y 1 , y 2 ) \ U^{-1}: x_1 = h_1(y_1,y_2)\,,\quad x_2 = h_2(y_1,y_2) U − 1 : x 1 = h 1 ( y 1 , y 2 ) , x 2 = h 2 ( y 1 , y 2 )
Then Y 1 = g 1 ( X 1 , X 2 ) , Y 2 = g 2 ( X 1 , X 2 ) \ Y_1 = g_1(X_1,X_2)\,,\quad Y_2 = g_2(X_1,X_2) Y 1 = g 1 ( X 1 , X 2 ) , Y 2 = g 2 ( X 1 , X 2 ) have the joint PDF
p Y 1 Y 2 ( y 1 , y 2 ) = { p X 1 X 2 ( h 1 ( y 1 , y 2 ) , h 2 ( y 1 , y 2 ) ) ∣ J ( y 1 , y 2 ) ∣ ( y 1 , y 2 ) ∈ B 0 o t h e r w i s e p_{Y_1Y_2}(y_1,y_2) = \begin{dcases} p_{X_1X_2}(h_1(y_1,y_2), h_2(y_1,y_2))|J(y_1,y_2)| & (y_1,y_2)\in B \\ 0 & otherwise \end{dcases} p Y 1 Y 2 ( y 1 , y 2 ) = { p X 1 X 2 ( h 1 ( y 1 , y 2 ) , h 2 ( y 1 , y 2 )) ∣ J ( y 1 , y 2 ) ∣ 0 ( y 1 , y 2 ) ∈ B o t h er w i se
The joint distribution of random variables X 1 , ⋯ , X n \ X_1, \cdots, X_n X 1 , ⋯ , X n is defined as F ( x 1 , ⋯ , x n ) = P ( X 1 ⩽ x 1 , ⋯ , X n ⩽ x n ) \ F(x_1, \cdots, x_n) = P(X_1 \leqslant x_1, \cdots, X_n \leqslant x_n) F ( x 1 , ⋯ , x n ) = P ( X 1 ⩽ x 1 , ⋯ , X n ⩽ x n )
If each X k \ X_k X k is discrete, then their joint PMF p ( k 1 , ⋯ , k m ) = P ( X 1 = k 1 , ⋯ , X n = k n ) \ p(k_1,\cdots, k_m) = P(X_1 = k_1, \cdots, X_n = k_n) p ( k 1 , ⋯ , k m ) = P ( X 1 = k 1 , ⋯ , X n = k n )
If each X k \ X_k X k is continuous, then there exists p ( x 1 , ⋯ , x n ) \ p(x_1, \cdots, x_n) p ( x 1 , ⋯ , x n ) such that F ( x 1 , ⋯ , x n ) = ∫ − ∞ x 1 ⋯ ∫ − ∞ x n p ( x 1 , ⋯ , x n ) d x 1 ⋯ d x n \ F(x_1,\cdots, x_n) = \int_{-\infty}^{x_1}\cdots \int_{-\infty}^{x_n}p(x_1,\cdots , x_n)\, dx_1\cdots dx_n F ( x 1 , ⋯ , x n ) = ∫ − ∞ x 1 ⋯ ∫ − ∞ x n p ( x 1 , ⋯ , x n ) d x 1 ⋯ d x n .
p ( x 1 , ⋯ , x n ) \ p(x_1,\cdots,x_n) p ( x 1 , ⋯ , x n ) is the joint PDF of X 1 , ⋯ , X n \ X_1, \cdots, X_n X 1 , ⋯ , X n
Suppose that X 1 , ⋯ , X n \ X_1, \cdots, X_n X 1 , ⋯ , X n have joint PDF γ ( x 1 , ⋯ , x n ) \ \gamma(x_1,\cdots,x_n) γ ( x 1 , ⋯ , x n )
the conditional PDF γ ( x 1 , ⋯ , x m ∣ x m + 1 , ⋯ , x n ) = γ ( x 1 , ⋯ , x n ) γ ( x m + 1 , ⋯ , x n ) \ \gamma(x_1, \cdots, x_m | x_{m+1},\cdots, x_n) = \displaystyle\frac{\gamma(x_1, \cdots, x_n)}{\gamma(x_{m+1}, \cdots, x_n)} γ ( x 1 , ⋯ , x m ∣ x m + 1 , ⋯ , x n ) = γ ( x m + 1 , ⋯ , x n ) γ ( x 1 , ⋯ , x n )
Take the transform Y 1 = g 1 ( X 1 , ⋯ , X n ) ⋯ Y 2 = g 2 ( X 1 , ⋯ , X n ) \ Y_1 = g_1(X_1,\cdots,X_n)\ \cdots \ Y_2 = g_2(X_1,\cdots,X_n) Y 1 = g 1 ( X 1 , ⋯ , X n ) ⋯ Y 2 = g 2 ( X 1 , ⋯ , X n ) . The inverse transform X 1 = h 1 ( Y 1 , ⋯ , Y n ) ⋯ X n = h n ( Y 1 , ⋯ , Y n ) \ X_1 = h_1(Y_1,\cdots,Y_n)\ \cdots\ X_n = h_n(Y_1,\cdots,Y_n) X 1 = h 1 ( Y 1 , ⋯ , Y n ) ⋯ X n = h n ( Y 1 , ⋯ , Y n )
The joint PDF of Y 1 , ⋯ , Y n \ Y_1, \cdots, Y_n Y 1 , ⋯ , Y n is p Y ( y 1 , ⋯ , y n ) = p X ( h 1 ( y 1 , ⋯ , y n ) , ⋯ , h n ( y 1 , ⋯ , y n ) ) ∣ J ( y 1 , ⋯ , y n ) ∣ \ p_Y(y_1, \cdots, y_n) = p_X(h_1(y_1,\cdots,y_n), \cdots, h_n(y_1,\cdots,y_n)) |J(y_1, \cdots, y_n)| p Y ( y 1 , ⋯ , y n ) = p X ( h 1 ( y 1 , ⋯ , y n ) , ⋯ , h n ( y 1 , ⋯ , y n )) ∣ J ( y 1 , ⋯ , y n ) ∣
Central Limit Theorem Defination Let { X n } n ∈ Z + \ \{X_n\}_{n\in\mathbb{Z}^{+}} { X n } n ∈ Z + be a sequence of random variables and X \ X X be a random variable.
{ X n } n ∈ Z + \ \{X_n\}_{n\in\mathbb{Z}^{+}} { X n } n ∈ Z + converges to X \ X X in probability if
∀ ε > 0 , lim n → + ∞ P ( ∣ X n − X ∣ ⩾ ε ) = 0 \forall \varepsilon > 0\,,\ \lim\limits_{n\rightarrow + \infty}P(\lvert X_n - X \rvert \geqslant \varepsilon) = 0 ∀ ε > 0 , n → + ∞ lim P (∣ X n − X ∣ ⩾ ε ) = 0
Denote X n ⟶ p X \ X_n\stackrel{p}\longrightarrow X X n ⟶ p X
{ X n } n ∈ Z + \ \{X_n\}_{n\in\mathbb{Z}^{+}} { X n } n ∈ Z + converges to X \ X X in mean square sense if
E [ X n 2 ] < + ∞ and lim n → + ∞ E [ ∣ X n − X ∣ 2 ] = 0 E[X_n^2] < +\infty\ \text{and}\ \lim\limits_{n\rightarrow + \infty} E[\lvert X_n - X \rvert^2] = 0 E [ X n 2 ] < + ∞ and n → + ∞ lim E [∣ X n − X ∣ 2 ] = 0
Denote X n ⟶ m . s . X \ X_n\stackrel{m.s.}\longrightarrow X X n ⟶ m . s . X
{ X n } n ∈ Z + \ \{X_n\}_{n\in\mathbb{Z}^{+}} { X n } n ∈ Z + converges to X \ X X in distribution if
∀ x , lim n → + ∞ F X n ( x ) = F X ( x ) \forall x \,,\ \lim\limits_{n\rightarrow + \infty}F_{X_n}(x) = F_X(x) ∀ x , n → + ∞ lim F X n ( x ) = F X ( x )
Denote X n ⟶ d X \ X_n\stackrel{d}\longrightarrow X X n ⟶ d X
Property If X n ⟶ m . s . X \ X_n \stackrel{m.s.}\longrightarrow X X n ⟶ m . s . X then X n ⟶ p X \ X_n \stackrel{p}\longrightarrow X X n ⟶ p X
If X n ⟶ p X \ X_n \stackrel{p}\longrightarrow X X n ⟶ p X then X n ⟶ d X \ X_n \stackrel{d}\longrightarrow X X n ⟶ d X
Property A sequence { X n } n ∈ Z + \ \{X_n\}_{n \in \mathbb{Z}^+} { X n } n ∈ Z + of random variables converges to a random variable X \ X X in distribution if and only if their characteristic functions satisfy Φ X n ( t ) → Φ X ( t ) \ \Phi_{X_n}(t) \rightarrow \Phi_{X}(t) Φ X n ( t ) → Φ X ( t )
Theorem Suppose that { X n } n ∈ Z + \ \{X_n\}_{n \in \mathbb{Z}^+} { X n } n ∈ Z + is a sequence of independent and identically distributed(i . i . d i.i.d i . i . d ) random variables and each X n \ X_n X n has the expectation μ \ \mu μ and variance σ 2 \ \sigma^2 σ 2 .
Let S n = ∑ k = 1 n X k \ S_n = \sum\limits_{k = 1}^{n} X_k S n = k = 1 ∑ n X k , then the sequence of random variables S n − n μ n \ \displaystyle\frac{S_n - n\mu}{ \sqrt{n}} n S n − n μ converges to a Gaussian random variable X ∼ N ( 0 , σ 2 ) \ X \sim N(0,\sigma^2) X ∼ N ( 0 , σ 2 )
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 孤舟のBlog !