Probability and Random Processes Cheat Sheet

Berkeley, USA

This page is a collection of all theorems taught in EECS126: Probability and Random Processes, Spring 2021. A good reference!

Link to PDF version here.

Probability Basics#

Conditional Probability $$P(A | B) = \frac{P(A \cap B)}{P(B)}, P(B) > 0$$
Total Probability Theorem $$P(B) = \sum_{i=1}^{n} P(A_i)P(B | A_i)$$
Bayes Rules $$P(A_i | B) = \frac{P(A_i)P(B|A_i)}{P(B)}$$
Union Bound $$P(\bigcup_{i=1}^{\infty} Ai) \leq \sum{i=1}^{\infty}P(A_i)$$
Independence $$P(A | B) = P(A) \iff \text{A and B are independent} $$
Conditional Independence $$P(A \cap B | C) = P(A | C) P(B | C) \implies \text{A and B conditionally independent}$$
Independence of Several Events $$P(\cap_{i \in S} Ai) = \prod{i \in S} P(A_i)$$
Counting Permutations of Size $k$ in $n$ Objects $$^nP_k = \frac{n!}{(n-k)!}$$
Counting Ways to Choose $k$ Objects in $n$ Objects $${n\choose k} = \frac{n!}{k!(n-k)!}$$
Counting Ways To Partition $n$ Objects into $n^i$ Groups $${n\choose n_1, n_2… n_k} = \frac{n!}{n_1! n_2!…n_k!}$$

Discrete Random variables#

Bernoulli Random Variable $$ \begin{aligned} P(X = k) = \begin{cases} 1 & \text{ with probability } p \newline 0 & \text{ with probability } 1-p \end{cases} \end{aligned} $$

$$\mathbb{E}(X) = p$$ $$var(X) = p(1-p)$$

Binomial Random Variable $$P(X = k) = {n\choose k} p^k (1-p)^{n-k}$$ $$\mathbb{E}(X) = np$$ $$var(X) = np(1-p)$$
Geometric Random Variable $$P(X = k) = (1-p)^{k-1} p$$ $$\mathbb{E}(X) = \frac{1}{p}$$ $$var(X) = \frac{1-p}{p^2}$$
Poisson Random Variable $$P(X_\lambda = k) = \frac{e^{-\lambda}\lambda^k}{k!}$$ $$\mathbb{E}(X) = \lambda $$ $$var(X) = \lambda$$
Linearity of a Poisson RV $$Poisson(\lambda) + Poisson(\mu) \sim Poisson(\lambda + \mu)$$
Uniform Random Variable \begin{equation} P(X = k) = \begin{cases} \frac{1}{b-a+1} & \ k \in [a,b]
0 & \text{ otherwise} \end{cases} \end{equation}
Joint PMFs $$P_{X, Y} (x, y) = \Pr(X = x, Y = y)$$ $$PX(x) = \sum{y} P_{X, Y}(x, y) \text{ and vice versa}$$
Conditional PMFs $$P{X|A}(X = x|A) = \frac{P({X = x} \cap A)}{P(A)}$$ $$P{XY}(x|y) = \frac{P{X,Y}(x,y)}{P_Y(y)}$$

Expectation, Variance and Covariance#

Expectation $$\mathbb{E}(X) = \sum_{x} xP(X = x)$$
Law of The Unconscious Statistician $$\mathbb{E}(g(X)] = \sum_{x} g(x)P(X = x)$$
Variance $$var(X) = \mathbb{E}[(X - \mathbb{E}(X))^2] \geq 0$$
Standard Deviation $$\sigma = \sqrt{var}$$
Linearity of Expectation $$\mathbb{E}(aX + bY) = a\mathbb{E}(X) + b\mathbb{E}(Y)$$
Expectation of Joint Distribution $$\mathbb{E}(g(X, Y)) = \sum{x}\sum{y} g(x, y)P_{X, Y}(x, y)$$
Variance of a Sum of Random Variables $$var(X+Y) = var(X) + var(Y) + 2cov(X, Y)$$
Conditional Expectation $$\mathbb{E}(X | Y = y) = \sum{x} x\ P{X|Y}(x|y)$$
Total Expectation Theorem $$\mathbb{E}(X) = \sum_{y} P_Y(y)\mathbb{E}(X|Y=y)$$
Iterated Expectation $$\mathbb{E}(X) = \mathbb{E}(\mathbb{E}(X|Y))$$
Tower Property $$\mathbb{E}[\mathbb{E}[X|Y]g(Y)] = \mathbb{E}[Xg(Y)]$$
Expectation of Independent Variables $$\mathbb{E}(XY) = \mathbb{E}(X)\mathbb{E}(Y) \text{ if X, Y independent}$$
Covariance $$cov(X,Y) = \mathbb{E}(XY) - \mathbb{E}(X)\mathbb{E}(Y)$$
Correlation Coefficient $$\rho(X, Y) = \frac{cov(X,Y)}{\sqrt{Var(X)Var(Y)}}$$ $$|\rho| \leq 1$$
Variance of Two Independent Variables

$$Var[XY] = \mathbb{E}[X^2]\mathbb{E}[Y^2] - \mathbb{E}[X]^2\mathbb{E}[Y]^2$$

Law of Total Variance $$var(X) = Var(\mathbb{E}(X|Y)) + \mathbb{E}(var(X|Y))$$

Continuous Random Variables#

Probability Density Functions $$P(X \in [a, b]) = \int_{a}^{b} f_X(x)dx$$
Cumulative Ditribution Function $$FX(x) = \int{-\infty}^{x} f(t)dt$$
Uniform Distribution $$ \begin{aligned} f_X(x) &= \frac{1}{b-a},\ a<x<b \newline \mathbb{E}(X) &= \frac{a+b}{2} \newline var(X) &= \frac{(b-a)^2}{12} \end{aligned} $$
Exponential Distribution $$ \begin{aligned} f_X(x) &= \lambda e^{-\lambda x},\ x>0 \newline F_X(x) &= 1 - e^{-\lambda x} \newline \mathbb{E}(X) &= \frac{1}{\lambda} \newline var(X) &= \frac{1}{\lambda^2} \end{aligned} $$
Gaussian Distribution $$f_X(X) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^²⁄₂\sigma^2}$$
Sum of Two Gaussian Variables $$aN(\mu_1, \sigma^2_1) + bN(\mu_2, \sigma^2_2) \sim N(a\mu_1 + b\mu_2, a^2\sigma^2_1 + b^2\sigma_2^2)$$
Joint PDFs $$f{X|Y}(x|y) = \frac{f{X, y}(x, y)}{f_Y(y)}$$
Independence of Continuous Variables $$f_{X, Y}(x, y) = f_x(x)f_Y(y)$$

Order Statistics#

Smallest RV in a set of RVs $$\text{Let } Y = \min_{1 \leq k \leq n} X_k \text{ , iid with CDF $F_X$}$$ $$F_Y(y) = 1 - (1 - F_X(y))^n$$
Largest RV in a set of RVs $$\text{Let } Y = \max_{1 \leq k \leq n} X_k \text{ , iid with CDF $F_X$}$$ $$F_Y(y) = (F_X(y))^n$$

Convolution#

Discrete Convolution $$ \begin{aligned} pZ(z) &= P(X+Y=z) = \sum{x}P(X=x, Y=z-x) \newline &= \sum_{x}P_x(x)P_Y(z-x) \text{ if X, Y independent} \end{aligned} $$
Continuous Convolution $$ fZ(z) = \int{-\infty}^{\infty} f_X(x)f_Y(z-x)dx $$

Moment Generating Function#

MGF for a RV $$ \begin{aligned} Mx(s) &= \mathbb{E}[e^{sx}] \newline &=\int{-\infty}^{\infty} e^{sx} f_X(x)dx \end{aligned} $$
Derivative of an MGF $$ \frac{d^nM(s)}{ds^n} |_{s=0} = \int x^nf(x)dx = \mathbb{E}[X^n] $$
MGF of a Poisson RV $$M(s) = e^{\lambda (e^s-1)}$$
MGF of a Exponential RV $$M(s) = \frac{\lambda}{\lambda - s}\text{ , $s < \lambda$}$$
MGF of the Standard Normal Gaussian RV $$M(s) = e^{s^²⁄₂}$$
Moments of Standard Normal RV $$ \mathbb{E}(X^m) = \begin{cases} 0 & \text{ , m odd} \newline 2^{-m/2}\frac{m!}{(m/2)!} & \text{ , m even} \end{cases} $$
MGF of a Geometric RV $$M(s) = \frac{pe^s}{1- (1-p)e^s}$$
MGF of a Bernoulli RV $$M(s) = 1 - p + pe^s$$
MGF of a Binomial RV $$M(s) = (1 - p + pe^s)^n$$
MGF of a Uniform RV $$ \begin{equation} M(s) = \begin{cases} \frac{e^{bs} - e^{as}}{s(b-a)} &\ s \neq 0 \newline 1 &\ s = 0 \end{cases} \end{equation} $$
MGF of a Sum of RVs $$ \begin{aligned} \text{Let } Z &= \sum X_i \newline MZ(s) &= \prod M{X_i}(s) \end{aligned} $$
MGF of a $Y = a^TX$, X is Gaussian Vector $$M_Y(s) = M_X(sa) = \exp{(s(a^T\mu_x) +\frac{1}{2}s^2a^T\Sigma a)}$$

Bounds#

Markov Inequality

$$P(X \geq a) \leq \frac{\mathbb{E}[X]}{a}$$

Chebyshev’s Inequality $$P(|X - \mu| \geq c) \leq \frac{\sigma^2}{c^2}$$
Chernoff Bound $$P(X \geq a) \leq \frac{\mathbb{E}[e^{sx}]}{e^{sa}} \text{ , $s > 0$}$$ $$P(X \leq a) \leq \frac{M(s)}{e^{sa}} \text{ , $s \leq 0$}$$
Jensen Inequality $$f(\mathbb{E}(x)) \leq \mathbb{E}[f(x)] \text{ , f is convex, $f”(x) > 0$}$$
Weak Law of Large Numbers

$$\lim{n \xrightarrow{} \infty} P(|\frac{1}{n}\sum{i=1}^{n}X_i - \mathbb{E}[X]| \geq \epsilon) = 0$$

Strong Law of Large Numbers $$P(\lim_{n\xrightarrow{} \infty} M_n = \mu) = 1$$
Central Limit Theorem $$ \begin{aligned} \text{Define } Z &= \frac{S_n - n\mu}{\sqrt{n}\sigma} \newline F_Z(z) &\xrightarrow{} \phi(z) \end{aligned} $$

Convergences#

Almost Sure Convergence $$P(\lim_{n \xrightarrow{} \infty} X_n = X) = 1$$
Convergence in Probability $$\lim_{n \xrightarrow{} \infty} P(|X_n - X| \geq \epsilon) = 0$$
Convergence in Distribution $$\lim{n \xrightarrow{} \infty} F{X_n}(x) = F_X(x)\ \ \forall x$$

Entropy#

Entropy $$H(X) = - \sum_{i=1}^{n} p_i \ln(p_i)$$
Chain Rule of Entropy $$H(X, Y) = H(Y) + H(X|Y) = H(X) + H(Y|X)$$
Convergence of Joint Entropy $$-\frac{1}{n} \log p(x_1, x_2… x_n) \xrightarrow{p} H(X)$$

Information Theory#

Source Coding Theorem
As $n \xrightarrow{} \infty$, consider N iid RVs with entropy $H(X)$. You can compress this into no more and no less than $NH(X)$ bits without sending over extra bits or losing information.
Channel Coding Theorem
Define channel capacity as the $\frac{ \text{# of message input bits}}{\text{# of bits transmitted}}$. Any sequence of codes with error probability $p \xrightarrow{} 0$ has a rate R $<$ capacity.
Capacity of a BEC $$C = 1 - p$$
Capacity of a BSC $$C = 1 - H(p)$$
Asymptotic Equipartition (AEP) Theorem $$ \begin{array}[1] A_\epsilon^{(n)} = {(x_1, x_2… x_n): p(x_1, x_2… x_n) \geq 2^{-n(H(X) + \epsilon)}} \newline p((x_1, x_2… xn) \in A\epsilon^{(n)}) \xrightarrow{n \xrightarrow{} \infty} 1 \newline |A_\epsilon^{(n)}| \leq 2^{n(H(X) + \epsilon)} \end{array} $$
Average Number of Bits Transmitted $$\mathbb{E}[number\ bits] \leq n(H(X) + \epsilon)$$
Mutual Information $$I(X;Y) = \sum p{XY}(x, y)\log\frac{P{XY}(x, y)}{P_X(x)P_y(Y)}$$
Mutual Information and Entropy $$I(X;Y) = H(X) + H(Y) - H(X, Y)$$
Capacity of A Channel $$C = \max_{p_x} I(X;Y)$$
Upper Bound on Probability of Error in BEC $$P(\text{error}) = 2^{-n(1-p) + L(n)}$$ $$\text{ where n = # bits of bits sent and L = # of bits in message}$$

Discrete Time Markov Chains#

Markov Property $$P(X_{n+1} | X_n…X1) = P(X{n+1} | X_n)$$
Chapman Komogorov Equations $$P{ij}^n = [P^n]{ij}$$
Periodicity $$d(i) = gcd\ {n \geq 1: P_{ii}^n > 0}$$
Stationary Distribution $$\pi P = \pi$$
Hitting Time \begin{equation} \beta(i) = \begin{cases} 1 + \sumj p{ij}\beta_j & i \notin A
0 & i \in A \end{cases} \end{equation}
Detailed Balance Equations $$\pii P{ji} = \pii P{ij},\ \ i, j \in S$$
Stationary Distribution of an Undirected Graph $$\pi(i) = \frac{d(i)}{\sum_{j}d(j)} = \frac{degree(i)}{2E}$$

Poisson Processes#

Number of arrivals within $t$ $$P(N_t = n)\sim Poisson(\lambda t) = \frac{e^{-\lambda t}(\lambda t)^n}{n!}$$
Inter-arrival Time $$S_i \sim Exp(\lambda)$$
Sum of Inter-arrival Times: Erlang Distribution $$f_{T_n}(s) = \frac{\lambda e^{-\lambda s}(\lambda s)^{n-1}}{(n-1)!}$$
Memoryless Property $$N_{Ti} - N{T_{i-1}} \sim Poisson(\lambda (ti - t{i-1}))$$
Poisson Merging $$PP(\lambda_1) + PP(\lambda_2) \sim PP(\lambda_1 + \lambda_2)$$
Poisson Splitting $$P(\min{T_a, T_b} = T_a) = \frac{\lambda_a}{\lambda_a + \lambda_b}$$
Random Incidence Paradox $$L \sim Erlang(2, k)$$

Continuous Time Markov Chains#

Temporal Homogeneity $$P(X_{t+\tau}\ |\ X_t = i, X_s = is \forall\ 0 \leq s < t) = P(X\tau = j | X_0 = i)$$
Rate of Self-Transition $$Q(i, i) = -\sum_{j\neq i} Q(i, j)$$
Balance Equations $$\sum_{i\neq j} \pi_i Q(i, j) = \pij \sum{k \neq j} Q(j, k)$$
Uniformization (Simulated DTMC) $$ \begin{aligned} \text{Let } q &= \text{sup}\ q(i) \text{ , strongest self-loop} \newline R &= I + \frac{1}{q}Q \end{aligned} $$
Hitting Time $$ \begin{equation} \beta(i) = \begin{cases} \frac{1}{q(i)} + \sum_{j \neq i} \frac{Q(i, j)}{q(i)} \beta(j) & i \notin A \newline 0 & i \in A \end{cases} \end{equation} $$

Random Graph#

Probability of a Random Graph Being Given Graph $$P(G = G_0) \sim Binomial({n\choose 2}, p)$$
Distribution of Degree of Vertex in Random Graph $$P(D = d) \sim Binomial(n-1, p) \xrightarrow{n \xrightarrow{} \infty} Poisson((n-1)p)$$
Erdos Renyi Theorem $$ \begin{aligned} Let\ p(n) = \lambda\frac{\ln(n)}{n} \newline P(G \text{ is connected}) \xrightarrow{n \xrightarrow{} \infty} 0 &\ ,\ \lambda < 1 \newline P(G \text{ is connected})\xrightarrow{n \xrightarrow{} \infty} 1 &\ ,\ \lambda > 1 \end{aligned} $$
Combining Graphs $$P(e \in G = G_1 \cup G_2 | e \in G_1 \cup e \in G2) = p_1 + p_2 - p_1p_2$$

Statistical Inference#

Bayes Rule Redux $$P(X = x | Y = y) = \frac{P{Y|X}(y|x) \pi(x)}{\sum{i} P_{Y|X}(y|i) \pi(i)}$$
Maximum A-Posteriori Estimation (MAP) $$\text{MAP}(X|Y=y) = \max{x} P{X|Y}(x|y) = \max{x} P{Y|X}(y|x)\pi(x)$$
Maximum Likelihood Estimation (MLE)

$$MLE(X|Y=y) = \maxx P{Y|X}(y|x)$$

Likelihood Ratio $$L(y) = \frac{P{Y|X}(y|1)}{P{Y|X}(y|0)}$$
MLE of a BSC $$ \begin{equation} MLE(X|Y=y) = \begin{cases} y & \text{if }p \leq ¹⁄₂ \newline 1-y & \text{if }p > ¹⁄₂ \end{cases} \end{equation} $$ $$ \begin{equation} MLE(X|Y=y) = \begin{cases} 1 & \text{if } L(y) \geq 1 \newline 0 & \text{if } L(y) < 1 \end{cases} \end{equation} $$
MAP of a BSC $$ \begin{equation} \text{MAP}(X | Y = y) = \begin{cases} 0 &\ \text{if } L(y) < \frac{\pi_0}{\pi_1} \newline 1 &\ \text{if } L(y) \geq \frac{\pi_0}{\pi_1} \end{cases} \end{equation} $$
Likelihood Ratio for $X\in {0, 1}$ with Gaussian Noise $$L(y) = \exp{[\frac{y}{\sigma^2} - \frac{1}{2\sigma^2}]}$$
MAP for $X\in {0, 1}$ with Gaussian Noise $$ \text{MAP}(X | Y = y) = \begin{cases} 0 &\ \text{if } L(y) < \frac{\pi_0}{\pi_1} = y \geq \frac{1}{2} + \sigma^2 log(\frac{\pi_0}{\pi_1}) \newline 1 &\ \text{if } L(y) \geq \frac{\pi_0}{\pi_1} \end{cases} $$
MLE for $X\in {0, 1}$ with Gaussian Noise $$ \begin{equation} MLE(X|Y=y) = \begin{cases} 1 & \text{if } L(y) \geq 1 = y \geq \frac{1}{2} \newline 0 & \text{if } L(y) < 1 \newline \end{cases} \end{equation} $$

Binary Error Testing#

Neyman-Pearson Lemma $$ \begin{aligned} &\text{Minimizes P(false negatives) with P(false positive) $\leq \beta$} \newline &\hat{X} = \begin{cases} 1\ & L(y) > \lambda \newline 0\ & L(y) < \lambda \newline Bern(\gamma)\ & L(y) = \lambda \end{cases} \newline &\text{Setting } P(\hat{X} = 1 | X = 0) = \beta \end{aligned} $$

Estimations#

Mean Square Error (MSE) $$\mathbb{E}[(X - \hat{X}(Y))^2]$$
Minimum Mean-Squared Estimation (MMSE) $$\text{MMSE}(X|Y) = \text{argmin}_{\hat{X}}\mathbb{E}[(X - \hat{X}(Y))^2] = \mathbb{E}(X|Y)$$
MMSE Theorem $$E[(X-g(Y))f(Y)] = 0 \ \forall f \implies g(Y) = \text{ MMSE}$$
Linear Least Squares Estimation $$ \begin{aligned} &\mathbb{L}[X|Y] = \min{\text{linear} \hat{X}} \mathbb{E}[|X - \hat{X}(Y)|^2] = \min{a, b_1…b_n} \mathbb{E}[|X - (a+\sum b_iY_i)|^2] \newline &\text{Let Y be a vector of all observations $Yi$} \newline &\text{Define } \Sigma{XY} = \mathbb{E}[(X-\mu_x)(Y-\muy)^T] \newline &\Sigma{Y} = \mathbb{E}[(Y-\mu_y)(Y-\mu_y)^T] \newline &\mathbb{L}[X|Y] = \mux + \Sigma{XY} \Sigma_{Y}\ ^{-1} (Y - \mu_y) \newline &\mathbb{L}[X|Y] = \mu_x + \frac{cov(X, Y)}{var(Y)} (Y - \mu_y) \end{aligned} $$
Linear Least Squared Error $$LLSE = var(X) - \Sigma{XY}\Sigma{Y}\ ^{-1}\Sigma_{YX}$$

Hilbert Spaces#

Hilbert Projection Theorem $$ \forall v \in H, U \subseteq H,\ \exists\ \text{$\min_{u \in U}$} ||u-v||:\text{ u is unique}
= 0 \ \forall\ u’ \in U $$
Hilbert Random Variable Theorem $$ = \mathbb{E}[XY]$$
LLSE in Hilbert Spaces $$ <\mathbb{L}[X|Y] - X, u> = \mathbb{E}[(\mathbb{L}[X|Y] - X)u] = 0\ \forall\ u $$
Orthogonality Principle

$$ \begin{aligned} &\mathbb{E}(\mathbb{L}[X|Y]) = \mathbb{E}[X] \newline &\mathbb{E}[(\mathbb{L}[X|Y] - X)Y_i] = 0 \newline &\mathbb{E}[(\mathbb{L}[X|Y] \cdot Y^T] = \mathbb{E}[XY^T] \end{aligned} $$

Magnitude $$||X|| = \sqrt{} = \sqrt{\mathbb{E}(|X|^2)}$$
Zero-Mean Multiple RVs $$ \begin{aligned} &\text{Let X, Y, Z zero-mean} \newline &L[X|Y, Z] = L[X|Y] - L[X|Z - L[Z|Y]] \newline &L[X|Y, Z] = L[X|Y] - L[X|Z] \text { if Y, Z uncorrelated} \end{aligned} $$