Author: Eiko

Time: 2025-02-17 13:04:14 - 2025-02-17 13:04:14 (UTC)

High Dimensional Statistics

  • Concentration and its uses

  • Sub-Gaussian random variables

  • Sub-Exponential random variables

Concentration

Let \(X_i\) be a sequence of i.i.d random variables

Basic Concentration Inequality

\[ \mathbb{P}\left[ f(|X-\mu|) \ge f(t) \right] \leq \min_f \frac{\mathbb{E}f(|X-\mu|)}{f(t)} \]

U-Statistics

Canonical statistics \(\sum X_i/n \sim \mathbb{E}X\), suppose we want to estimate \(\mathbb{E}|X_1-X_2|\), we would consider using \(\sum_{i<j} \frac{|x_i-x_j|}{\binom{n}{2}}\).

The \(U\)-statistics are of the form

\[U = \frac{1}{\binom{n}{2}} \sum_{i<j} g(x_i, x_j)\]

where \(g\) is a symmetric function of two variables.

Imagine that we have \(\|g\|_\infty <b\) or assume \(|x|<b\),

\[\begin{align*} U(x_1,\dots,x_n) - U(x_1,\dots,x_j',\dots,x_n)| &\le \frac{1}{\binom{n}{2}} \sum_{i, i\neq j} |g(x_i,x_j) - g(x_i,x_j')|\\ &\le \frac{2b(n-1)}{\binom{n}{2}} \\ &= \frac{4b}{n}. \end{align*}\]

\[ \mathbb{P}(|U-\mathbb{E}U| \ge t) \le 2 \exp\left(-\frac{nt^2}{8b^2}\right). \]