This page is a collection of all theorems taught in EECS126: Probability and Random Processes, Spring 2021. A good reference!
Link to PDF version here.
Probability Basics
- $$P(A | B) = \frac{P(A \cap B)}{P(B)}, P(B) > 0$$
- $$P(B) = \sum_{i=1}^{n} P(A_i)P(B | A_i)$$
- $$P(A_i | B) = \frac{P(A_i)P(B|A_i)}{P(B)}$$
- $$P(\bigcup_{i=1}^{\infty} A_i) \leq \sum_{i=1}^{\infty}P(A_i)$$
- $$P(A | B) = P(A) \iff \text{A and B are independent} $$
- $$P(A \cap B | C) = P(A | C) P(B | C) \implies \text{A and B conditionally independent}$$
- $$P(\cap_{i \in S} A_i) = \prod_{i \in S} P(A_i)$$
- $$^nP_k = \frac{n!}{(n-k)!}$$
- $${n\choose k} = \frac{n!}{k!(n-k)!}$$
- $${n\choose n_1, n_2... n_k} = \frac{n!}{n_1! n_2!...n_k!}$$
Discrete Random variables
- Bernoulli Random Variable $$ \begin{aligned} P(X = k) = \begin{cases} 1 & \text{ with probability } p \newline 0 & \text{ with probability } 1-p \end{cases} \end{aligned} $$
- $$P(X = k) = {n\choose k} p^k (1-p)^{n-k}$$$$\mathbb{E}(X) = np$$$$var(X) = np(1-p)$$
- $$P(X = k) = (1-p)^{k-1} p$$$$\mathbb{E}(X) = \frac{1}{p}$$$$var(X) = \frac{1-p}{p^2}$$
- $$P(X_\lambda = k) = \frac{e^{-\lambda}\lambda^k}{k!}$$$$\mathbb{E}(X) = \lambda $$$$var(X) = \lambda$$
- $$Poisson(\lambda) + Poisson(\mu) \sim Poisson(\lambda + \mu)$$
-
Uniform Random Variable \begin{equation*} P(X = k) = \begin{cases} \frac{1}{b-a+1} & \ k \in [a,b] \ 0 & \text{ otherwise} \end{cases} \end{equation*}
- $$P_{X, Y} (x, y) = \Pr(X = x, Y = y)$$$$P_X(x) = \sum_{y} P_{X, Y}(x, y) \text{ and vice versa}$$
- $$P_{X|A}(X = x|A) = \frac{P(\{X = x\} \cap A)}{P(A)}$$$$P_{X_Y}(x|y) = \frac{P_{X,Y}(x,y)}{P_Y(y)}$$
Expectation, Variance and Covariance
- $$\mathbb{E}(X) = \sum_{x} xP(X = x)$$
- $$\mathbb{E}(g(X)] = \sum_{x} g(x)P(X = x)$$
- $$var(X) = \mathbb{E}[(X - \mathbb{E}(X))^2] \geq 0$$
- $$\sigma = \sqrt{var}$$
- $$\mathbb{E}(aX + bY) = a\mathbb{E}(X) + b\mathbb{E}(Y)$$
- $$\mathbb{E}(g(X, Y)) = \sum_{x}\sum_{y} g(x, y)P_{X, Y}(x, y)$$
- $$var(X+Y) = var(X) + var(Y) + 2cov(X, Y)$$
- $$\mathbb{E}(X | Y = y) = \sum_{x} x\ P_{X|Y}(x|y)$$
- $$\mathbb{E}(X) = \sum_{y} P_Y(y)\mathbb{E}(X|Y=y)$$
- $$\mathbb{E}(X) = \mathbb{E}(\mathbb{E}(X|Y))$$
- $$\mathbb{E}[\mathbb{E}[X|Y]g(Y)] = \mathbb{E}[Xg(Y)]$$
- $$\mathbb{E}(XY) = \mathbb{E}(X)\mathbb{E}(Y) \text{ if X, Y independent}$$
- $$cov(X,Y) = \mathbb{E}(XY) - \mathbb{E}(X)\mathbb{E}(Y)$$
- $$\rho(X, Y) = \frac{cov(X,Y)}{\sqrt{Var(X)Var(Y)}}$$$$|\rho| \leq 1$$
-
Variance of Two Independent Variables
- Law of Total Variance $$var(X) = Var(\mathbb{E}(X|Y)) + \mathbb{E}(var(X|Y))$$
Continuous Random Variables
- $$P(X \in [a, b]) = \int_{a}^{b} f_X(x)dx$$
- $$F_X(x) = \int_{-\infty}^{x} f(t)dt$$
-
$$
\begin{aligned}
f_X(x) &= \frac{1}{b-a},\ a
- $$ \begin{aligned} f_X(x) &= \lambda e^{-\lambda x},\ x>0 \newline F_X(x) &= 1 - e^{-\lambda x} \newline \mathbb{E}(X) &= \frac{1}{\lambda} \newline var(X) &= \frac{1}{\lambda^2} \end{aligned} $$
- $$f_X(X) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/2\sigma^2}$$
- $$aN(\mu_1, \sigma^2_1) + bN(\mu_2, \sigma^2_2) \sim N(a\mu_1 + b\mu_2, a^2\sigma^2_1 + b^2\sigma_2^2)$$
- $$f_{X|Y}(x|y) = \frac{f_{X, y}(x, y)}{f_Y(y)}$$
- $$f_{X, Y}(x, y) = f_x(x)f_Y(y)$$
Order Statistics
- $$\text{Let } Y = \min_{1 \leq k \leq n} X_k \text{ , iid with CDF $F_X$}$$$$F_Y(y) = 1 - (1 - F_X(y))^n$$
- $$\text{Let } Y = \max_{1 \leq k \leq n} X_k \text{ , iid with CDF $F_X$}$$$$F_Y(y) = (F_X(y))^n$$
Convolution
- $$ \begin{aligned} p_Z(z) &= P(X+Y=z) = \sum_{x}P(X=x, Y=z-x) \newline &= \sum_{x}P_x(x)P_Y(z-x) \text{ if X, Y independent} \end{aligned} $$
- $$ f_Z(z) = \int_{-\infty}^{\infty} f_X(x)f_Y(z-x)dx $$
Moment Generating Function
- $$ \begin{aligned} M_x(s) &= \mathbb{E}[e^{sx}] \newline &=\int_{-\infty}^{\infty} e^{sx} f_X(x)dx \end{aligned} $$
- $$ \frac{d^nM(s)}{ds^n} |_{s=0} = \int x^nf(x)dx = \mathbb{E}[X^n] $$
- $$M(s) = e^{\lambda (e^s-1)}$$
- $$M(s) = \frac{\lambda}{\lambda - s}\text{ , $s < \lambda$}$$
- $$M(s) = e^{s^2/2}$$
- $$ \mathbb{E}(X^m) = \begin{cases} 0 & \text{ , m odd} \newline 2^{-m/2}\frac{m!}{(m/2)!} & \text{ , m even} \end{cases} $$
- $$M(s) = \frac{pe^s}{1- (1-p)e^s}$$
- $$M(s) = 1 - p + pe^s$$
- $$M(s) = (1 - p + pe^s)^n$$
- $$ \begin{equation*} M(s) = \begin{cases} \frac{e^{bs} - e^{as}}{s(b-a)} &\ s \neq 0 \newline 1 &\ s = 0 \end{cases} \end{equation*} $$
- $$ \begin{aligned} \text{Let } Z &= \sum X_i \newline M_Z(s) &= \prod M_{X_i}(s) \end{aligned} $$
- $$M_Y(s) = M_X(sa) = \exp{(s(a^T\mu_x) +\frac{1}{2}s^2a^T\Sigma a)}$$
Bounds
- Markov Inequality
- $$P(|X - \mu| \geq c) \leq \frac{\sigma^2}{c^2}$$
- $$P(X \geq a) \leq \frac{\mathbb{E}[e^{sx}]}{e^{sa}} \text{ , $s > 0$}$$$$P(X \leq a) \leq \frac{M(s)}{e^{sa}} \text{ , $s \leq 0$}$$
- $$f(\mathbb{E}(x)) \leq \mathbb{E}[f(x)] \text{ , f is convex, $f''(x) > 0$}$$
-
Weak Law of Large Numbers
- $$P(\lim_{n\xrightarrow{} \infty} M_n = \mu) = 1$$
- $$ \begin{aligned} \text{Define } Z &= \frac{S_n - n\mu}{\sqrt{n}\sigma} \newline F_Z(z) &\xrightarrow{} \phi(z) \end{aligned} $$
Convergences
- $$P(\lim_{n \xrightarrow{} \infty} X_n = X) = 1$$
- $$\lim_{n \xrightarrow{} \infty} P(|X_n - X| \geq \epsilon) = 0$$
- $$\lim_{n \xrightarrow{} \infty} F_{X_n}(x) = F_X(x)\ \ \forall x$$
Entropy
- $$H(X) = - \sum_{i=1}^{n} p_i \ln(p_i)$$
- $$H(X, Y) = H(Y) + H(X|Y) = H(X) + H(Y|X)$$
- $$-\frac{1}{n} \log p(x_1, x_2... x_n) \xrightarrow{p} H(X)$$
Information Theory
-
Source Coding Theorem
As $n \xrightarrow{} \infty$, consider N iid RVs with entropy $H(X)$. You can compress this into no more and no less than $NH(X)$ bits without sending over extra bits or losing information. -
Channel Coding Theorem
Define channel capacity as the C = # of message input bits / # of bits transmitted. Any sequence of codes with error probability $p \xrightarrow{} 0$ has a rate $R < C$. - $$C = 1 - p$$
- $$C = 1 - H(p)$$
- $$\mathbb{E}[number\ bits] \leq n(H(X) + \epsilon)$$
- $$I(X;Y) = \sum p_{XY}(x, y)\log\frac{P_{XY}(x, y)}{P_X(x)P_y(Y)}$$
- $$I(X;Y) = H(X) + H(Y) - H(X, Y)$$
- $$C = \max_{p_x} I(X;Y)$$
-
$$P(\text{error}) = 2^{-n(1-p) + L(n)}$$
where n = # bits of bits sent and L = # of bits in message
Discrete Time Markov Chains
- $$P(X_{n+1} | X_n...X_1) = P(X_{n+1} | X_n)$$
- $$P_{ij}^n = [P^n]_{ij}$$
- $$d(i) = gcd\ {n \geq 1: P_{ii}^n > 0\}$$
- $$\pi P = \pi$$
-
Hitting Time \begin{equation*} \beta(i) = \begin{cases} 1 + \sum_j p_{ij}\beta_j & i \notin A \ 0 & i \in A \end{cases} \end{equation*}
- $$\pi_i P_{ji} = \pi_i P_{ij},\ \ i, j \in S$$
- $$\pi(i) = \frac{d(i)}{\sum_{j}d(j)} = \frac{degree(i)}{2E}$$
Poisson Processes
- $$P(N_t = n)\sim Poisson(\lambda t) = \frac{e^{-\lambda t}(\lambda t)^n}{n!}$$
- $$S_i \sim Exp(\lambda)$$
- $$f_{T_n}(s) = \frac{\lambda e^{-\lambda s}(\lambda s)^{n-1}}{(n-1)!}$$
- $$N_{T_i} - N_{T_{i-1}} \sim Poisson(\lambda (t_i - t_{i-1}))$$
- $$PP(\lambda_1) + PP(\lambda_2) \sim PP(\lambda_1 + \lambda_2)$$
- $$P(\min\{T_a, T_b\} = T_a) = \frac{\lambda_a}{\lambda_a + \lambda_b}$$
- $$L \sim Erlang(2, k)$$
Continuous Time Markov Chains
- $$P(X_{t+\tau}\ |\ X_t = i, X_s = i_s \forall\ 0 \leq s < t) = P(X_\tau = j | X_0 = i)$$
- $$Q(i, i) = -\sum_{j\neq i} Q(i, j)$$
- $$\sum_{i\neq j} \pi_i Q(i, j) = \pi_j \sum_{k \neq j} Q(j, k)$$
- $$ \begin{aligned} \text{Let } q &= \text{sup}\ q(i) \text{ , strongest self-loop} \newline R &= I + \frac{1}{q}Q \end{aligned} $$
- $$ \begin{equation*} \beta(i) = \begin{cases} \frac{1}{q(i)} + \sum_{j \neq i} \frac{Q(i, j)}{q(i)} \beta(j) & i \notin A \newline 0 & i \in A \end{cases} \end{equation*} $$
Random Graph
- $$P(G = G_0) \sim Binomial({n\choose 2}, p)$$
- $$P(D = d) \sim Binomial(n-1, p) \xrightarrow{n \xrightarrow{} \infty} Poisson((n-1)p)$$
- $$ \begin{aligned} Let\ p(n) = \lambda\frac{\ln(n)}{n} \newline P(G \text{ is connected}) \xrightarrow{n \xrightarrow{} \infty} 0 &\ ,\ \lambda < 1 \newline P(G \text{ is connected})\xrightarrow{n \xrightarrow{} \infty} 1 &\ ,\ \lambda > 1 \end{aligned} $$
- $$P(e \in G = G_1 \cup G_2 | e \in G_1 \cup e \in G2) = p_1 + p_2 - p_1p_2$$
Statistical Inference
- $$P(X = x | Y = y) = \frac{P_{Y|X}(y|x) \pi(x)}{\sum_{i} P_{Y|X}(y|i) \pi(i)}$$
- $$\text{MAP}(X|Y=y) = \max_{x} P_{X|Y}(x|y) = \max_{x} P_{Y|X}(y|x)\pi(x)$$
-
Maximum Likelihood Estimation (MLE)
- $$L(y) = \frac{P_{Y|X}(y|1)}{P_{Y|X}(y|0)}$$
- $$ \begin{equation*} MLE(X|Y=y) = \begin{cases} y & \text{if }p \leq 1/2 \newline 1-y & \text{if }p > 1/2 \end{cases} \end{equation*} $$$$ \begin{equation*} MLE(X|Y=y) = \begin{cases} 1 & \text{if } L(y) \geq 1 \newline 0 & \text{if } L(y) < 1 \end{cases} \end{equation*} $$
- $$ \begin{equation*} \text{MAP}(X | Y = y) = \begin{cases} 0 &\ \text{if } L(y) < \frac{\pi_0}{\pi_1} \newline 1 &\ \text{if } L(y) \geq \frac{\pi_0}{\pi_1} \end{cases} \end{equation*} $$
- $$L(y) = \exp{[\frac{y}{\sigma^2} - \frac{1}{2\sigma^2}]}$$
- $$ \text{MAP}(X | Y = y) = \begin{cases} 0 &\ \text{if } L(y) < \frac{\pi_0}{\pi_1} = y \geq \frac{1}{2} + \sigma^2 log(\frac{\pi_0}{\pi_1}) \newline 1 &\ \text{if } L(y) \geq \frac{\pi_0}{\pi_1} \end{cases} $$
- $$ \begin{equation*} MLE(X|Y=y) = \begin{cases} 1 & \text{if } L(y) \geq 1 = y \geq \frac{1}{2} \newline 0 & \text{if } L(y) < 1 \newline \end{cases} \end{equation*} $$
Binary Error Testing
- Neyman-Pearson Lemma $$ \begin{aligned} &\text{Minimizes P(false negatives) with P(false positive) $\leq \beta$} \newline &\hat{X} = \begin{cases} 1\ & L(y) > \lambda \newline 0\ & L(y) < \lambda \newline Bern(\gamma)\ & L(y) = \lambda \end{cases} \newline &\text{Setting } P(\hat{X} = 1 | X = 0) = \beta \end{aligned} $$
Estimations
- $$\mathbb{E}[(X - \hat{X}(Y))^2]$$
- $$\text{MMSE}(X|Y) = \text{argmin}_{\hat{X}}\mathbb{E}[(X - \hat{X}(Y))^2] = \mathbb{E}(X|Y)$$
- $$E[(X-g(Y))f(Y)] = 0 \ \forall f \implies g(Y) = \text{ MMSE}$$
- $$ \begin{aligned} &\mathbb{L}[X|Y] = \min_{\text{linear} \hat{X}} \mathbb{E}[|X - \hat{X}(Y)|^2] = \min_{a, b_1...b_n} \mathbb{E}[|X - (a+\sum b_iY_i)|^2] \newline &\text{Let Y be a vector of all observations $Y_i$} \newline &\text{Define } \Sigma_{XY} = \mathbb{E}[(X-\mu_x)(Y-\mu_y)^T] \newline &\Sigma_{Y} = \mathbb{E}[(Y-\mu_y)(Y-\mu_y)^T] \newline &\mathbb{L}[X|Y] = \mu_x + \Sigma_{XY} \Sigma_{Y}\ ^{-1} (Y - \mu_y) \newline &\mathbb{L}[X|Y] = \mu_x + \frac{cov(X, Y)}{var(Y)} (Y - \mu_y) \end{aligned} $$
- $$LLSE = var(X) - \Sigma_{XY}\Sigma_{Y}\ ^{-1}\Sigma_{YX}$$
Hilbert Spaces
-
$$
\forall v \in H, U \subseteq H,\ \exists\ \text{$\min_{u \in U}$} ||u-v||:\text{ u is unique} \\
= 0 \ \forall\ u' \in U $$ -
$$
= \mathbb{E}[XY]$$ - $$ <\mathbb{L}[X|Y] - X, u> = \mathbb{E}[(\mathbb{L}[X|Y] - X)u] = 0\ \forall\ u $$
-
Orthogonality Principle
-
$$||X|| = \sqrt{
} = \sqrt{\mathbb{E}(|X|^2)}$$ - $$ \begin{aligned} &\text{Let X, Y, Z zero-mean} \newline &L[X|Y, Z] = L[X|Y] - L[X|Z - L[Z|Y]] \newline &L[X|Y, Z] = L[X|Y] - L[X|Z] \text { if Y, Z uncorrelated} \end{aligned} $$