Author: Eiko

Time: 2024-12-28 15:59:49 - 2025-01-03 12:56:59 (UTC)

Reference:

  • Theoretical Statistics by Robert W. Keener

  • Probability Theory by Eiko

  • Foundations of Modern Probability by Olav Kallenberg

Let \(\theta\in \Omega\) be some parameter space and \(\Omega = \Omega_0\cup \Omega_1\) be a partition of this space, \(X\sim \mathbb{P}(X|\theta)\) be certain law. \(H_i\) is a hypothesis that \(\theta\in \Omega_i\).

Hypothesis testing aims to tell which of the two competing hypotheses \(H_0\) or \(H_1\) is correct by observing \(X\).

Introduction and Intuitive Motivation

Hypothesis testing answers everyday questions — ‘Does this treatment work?’ ‘Is this coin fair?’ — with a statistically defensible decision rule based on data. The procedure must strike a balance:

  • Detect real effects when they exist (power).
  • Avoid false alarms when nothing is happening (Type I error).

Error Trade‑off

Truth \(H_0\) Truth \(H_1\)
Decide \(H_0\) ✅ Correct Type II error (\(\beta\))
Decide \(H_1\) Type I error (\(\alpha\)) ✅ Correct (\(1-\beta\))

The p-value

The p-value is the smallest \(\alpha\) at which a level‑\(\alpha\) test would still reject \(H_0\) for the observed data. A tiny p-value means either a rare event under \(H_0\) happened or \(H_0\) is implausible.

Test Functions

Non-Randomized Tests

A non-randomized test of \(H_0\) versus \(H_1\) can be specified by a critical region \(S\), so if \(X\in S\) we reject \(H_0\) in favor of \(H_1\).

Power function \(\beta_S:\Omega\to \mathbb{R}\) describes the probability of rejecting \(H_0\) given \(\theta\)

\[\beta_S(\theta) = \mathbb{P}(X\in S|\theta)\]

Significant level \(\alpha\) is the small error calculating the worst error rate of falsely rejecting \(H_0\) when it is true.

\[\alpha_S = \sup_{\theta\in \Omega_0} \beta_S(\theta).\]

In theory we would want \(\beta_S(\theta) = 1_{\Omega_1}\) which would imply \(\alpha_S = 0\), but this is not possible in practice.

Randomized Tests

Sometimes instead of giving a critical region \(S\) or equivalently a function \(1_S\), we give a critical function \(\varphi(x)\) instead, reflecting the probability of rejecting \(H_0\). Then a non-randomized test is just a special case of \(\varphi = 1_S\).

In this case, the power function is

\[ \beta_\varphi(\theta) = \mathbb{E}(\varphi(X)|\theta) \]

and the significant level is

\[ \alpha_\varphi = \sup_{\theta\in \Omega_0} \beta_\varphi(\theta) = \sup_{\theta\in \Omega_0} \mathbb{E}(\varphi(X)|\theta). \]

The main advantage of randomized tests is that they can form (convex) linear combinations.

Simple Hypothesis And Simple Tests

A hypothesis is simple if \(\Omega_i\) is a singleton.

Neyman-Pearson Lemma

Assume \(H_0\) and \(H_1\) are both simple, in this case there is a Neyman-Pearson Lemma describing all reasonable tests. Let \(\mu_1 = \mathbb{P}(X|\theta_1)\) and \(\mu_0 = \mathbb{P}(X|\theta_0)\) be the distributions of \(X\) under \(H_1\) and \(H_0\) respectively.

We have

\[\alpha_\varphi = \mu_0(\varphi) = \int \varphi(x) \mu_0(dx)\] \[\beta_\varphi(\theta_i) = \mu_i(\varphi) = \int \varphi(x) \mu_i(dx).\]

We would want to choose \(\varphi\) such that \(\mu_0(\varphi)\to 0\) and \(\mu_1(\varphi)\to 1\). Consider maximizing \(\beta_\varphi(\theta_1)\) subject to \(\alpha_\varphi\le \alpha\).

Lagrange Multiplier Lemma

  • Let \(k\ge 0\) be any constant, then maximizing \(\mu_1(\varphi) - k\mu_0(\varphi)\) gives the function \(\varphi^*\) maximizing \(\mu_1(\varphi)\) subject to \(\mu_0(\varphi)\le \alpha\), here \(\alpha = \mu_0(\varphi^*)\).

    \[\begin{align*} \varphi^* &\in \mathrm{argmax}_\varphi \left(\mu_1(\varphi) - k \mu_0(\varphi) \right) \\ &\subset \mathrm{argmax}_{\mu_0(\varphi)\le \alpha} \mu_1(\varphi). \end{align*}\]

  • Moreover, any function \(\varphi^{**}\) maximizing \(\mu_1(\varphi)\) subject to \(\mu_0(\varphi)\le \alpha\) must have \(\mu_0(\varphi^{**}) = \alpha\).

    \[ \varphi^{**}\in \mathrm{argmax}_{\mu_0(\varphi)\le \alpha} \mu_1(\varphi) \subset \{ \varphi: \mu_0(\varphi) = \alpha \}.\]

Note that \(\varphi^*\) and \(\alpha\) depend on \(k\) here.

Proof.

  • Let \(\varphi^*\) be the maximizing function. It suffices to prove that \(\mu_0(\varphi)\le \alpha \Rightarrow \mu_1(\varphi)\le \mu_1(\varphi^*)\).

    We have

    \[ \mu_1(\varphi) - k(\mu_0(\varphi) - \alpha) \le \mu_1(\varphi^*) - k(\mu(\varphi^*) -\alpha)\]

    So \(\mu_0(\varphi) - \alpha \le 0 \Rightarrow -k(\mu_0(\varphi) - \alpha)\ge 0\) therefore

    \[\begin{align*} \mu_1(\varphi) &\le \mu_1(\varphi) - k(\mu_0(\varphi) - \alpha) \\ &\le \mu_1(\varphi^*) - k(\mu_0(\varphi^*) -\alpha) \\ &= \mu_1(\varphi^*). \end{align*}\]

  • We know that \(\varphi^{**}\) and \(\varphi^*\) are both in \(\mathrm{argmax}_{\mu_0(\varphi)\le \alpha} \mu_1(\varphi)\). Therefore \(\mu_1(\varphi^{**})=\mu_1(\varphi^*)\). The fact that \(\varphi^*\in \mathrm{argmax}(\mu_1(\varphi) - k\mu_0(\varphi))\) implies

    \[\begin{align*} \mu_1(\varphi^*) - k\mu_0(\varphi^*) &\ge \mu_1(\varphi^{**}) - k\mu_0(\varphi^{**}) \\ &= \mu_1(\varphi^*) - k\mu_0(\varphi^{**}). \end{align*}\]

    Therefore \(\mu_0(\varphi^{**}) \ge \mu_0(\varphi^*) = \alpha\) by definition.

How To Maximize \(\mu_1(\varphi)-k\mu_0(\varphi)\) ?

We know that \(\mu_1 - k\mu_0\) is a finite signed measure, so according to Hahn decomposition, any finite signed measure can be uniquely decomposed into the difference of two mutually singular finite measures

\[ \mu_1 - k \mu_0 = \nu_+ - \nu_- .\]

So maximizing \(\mu_1(\varphi) - k\mu_0(\varphi)\) is equivalent to maximizing \(\nu_+(\varphi) - \nu_-(\varphi)\). From which it is clear that we can pick \(\varphi = 1_{A_+}\) where \(A_+\) is the set where \(\nu_+\) is concentrated, and there is a freedom for us to pick anything from \([0,1]\) on a set of measure zero in \(|\mu_1 - k\mu_0|\).

If \(\mu_i\) can be written as density functions, then the set \(A_+\) is simply \(\left\{x: \frac{\mathrm{d} \mu_1}{\mathrm{d} \mu} > k \frac{\mathrm{d} \mu_0}{\mathrm{d} \mu}\right\}\). This can be seen as a slight generalization of a likelihood ratio test, if we ignore the division by zero problem, it can be written as \(\left\{\frac{\mathrm{d} \mu_1}{\mathrm{d} \mu_0} > k\right\}\).

The Neyman-Pearson Lemma

The Lemma states that, for a simple test scenario, given any level \(\alpha\in [0,1]\), there exists a likelihood ratio test (which means \(1_{L>k}\le \varphi\le 1_{L\ge k}\) and potentially some other function values on a measure zero set) \(\varphi_\alpha\) with exactly level \(\alpha\) (i.e. \(\mu_0(\varphi_\alpha)=\alpha\)). The likelihood ratio test \(\varphi_\alpha\) is chosen to be maximizing \(\mu_1(\varphi) - k\mu_0(\varphi)\) and any likelihood ratio test maximizes the power function \(\beta_{\varphi_\alpha}(\theta_1)\) subject to the significant level \(\alpha_{\varphi_\alpha} \le \alpha\).

Some Detailed Results Relating To Neyman-Pearson Lemma

  • For \(\alpha\in [0,1]\), let \(k\) be a critical value for a likelihood ratio test \(\varphi_\alpha\) in the sense of Neyman-Pearson Lemma, i.e.

    \[\varphi_\alpha = 1_{\left\{x:\frac{\mathrm{d} \mu_1(x)}{\mathrm{d} \mu_0(x)} > k\right\}} \text{ a.e. in } |\mu_1 - k \mu_0|.\]

    Then \(\mu_0(\varphi_\alpha) = \alpha\) and \(\mu_1(\varphi_\alpha) = \beta_{\varphi_\alpha}(\theta_1)\).

    We have

    \[\varphi^{**}\in \mathrm{argmax}_{\mu_0\le \alpha}\mu_1 \Rightarrow \varphi^{**}=\varphi_\alpha \text{ a.e. in } |\mu_1 - k \mu_0|.\]

  • If \(\mu_0\neq \mu_1\) or \(k\neq 1\), with \(\varphi_\alpha\) a likelihood ratio test with level \(\alpha\in (0,1)\), then \(\mu_1(\varphi_\alpha) > \alpha\).

\[\mu_0\neq \mu_1\Rightarrow \mu_1(\varphi_\alpha)>\alpha. \]

Proof.

  • We already proved that \(\mu_0(\varphi^{**})=\mu_0(\varphi_\alpha) = \alpha\). Since

    \[ (\mu_1 - k\mu_0)(\varphi^{**}) = (\mu_1 - k\mu_0)(\varphi_\alpha)\]

    and by the construction of \(\varphi_\alpha\), we know that \(\varphi_\alpha - \varphi^{**}\ge 0\) a.e. in \(|\mu_1 - k \mu_0|\). This implies \(\varphi^{**}= \varphi_\alpha\) a.e. in \(|\mu_1 - k \mu_0|\).

  • Consider the constant test \(\varphi_c = \alpha\in (0,1)\), by \(\varphi_\alpha\in \mathrm{argmax}_{\mu_0\le\alpha}\mu_1\) we know \(\mu_1(\varphi_\alpha)\ge \mu_1(\varphi_c) = \alpha\). If equality holds then \(\varphi_c\) is also in the set, thus \(\varphi_c = \varphi_\alpha\) a.e. in \(|\mu_1 - k \mu_0|\), but this equality never hold since \(\varphi_\alpha\in \{0,1\}\) a.e. in \(|\mu_1 - k \mu_0|\). The only possible case is \(\mu_0=\mu_1\) and \(k=1\).

Examples

  • Suppose we are testing

    \[\mathbb{P}(X|\theta) \sim \text{Exponential}(\theta) \sim \theta e^{-\theta x}1_{x\ge 0}\,\mathrm{d}{x}\] with hypothesis \(H_0: \theta = \theta_0\) and \(H_1:\theta=\theta_1\), for simplicity assume \(\theta_1>\theta_0\). The likelihood ratio test is of the form

    \[ \frac{\theta_1e^{-\theta_1x}}{\theta_0e^{-\theta_0x}} > k \Leftrightarrow x < \frac{1}{\theta_1-\theta_0}\log\frac{\theta_1}{k\theta_0} = x_k.\]

    \[\alpha = \mu_0\varphi = \int_0^{x_k} \theta_0e^{-\theta_0x}\,\mathrm{d}{x} = 1 - e^{-\theta_0x_k}.\]

    \[ x_k = \frac{1}{\theta_0}\log\frac{1}{1-\alpha}.\]

    And the test with level \(\alpha\) is simply given by \(\varphi_\alpha = 1_{x<\frac{1}{\theta_0}\log \frac{1}{1-\alpha}}\). Some magic is happening here, this test is optimal as it maximizes \(\mu_1(\varphi)\) among level \(\le \alpha\), but is independent of \(\theta_1\)! (This is an example of Uniformly Most Powerful Test. An interesting question is when does this happen?)

  • Consider a very simple random variable \(X\sim \text{Bernoulli}(p)\), with \(H_0: p=\frac{1}{2}\) and \(H_1: p=\frac{1}{4}\). The likelihood ratio is

    \[ L(x) = \begin{cases} \frac{1}{2} & x=1 \\ \frac{3}{2} & x=0. \end{cases}\]

    Then clearly there are \(5\) different regions of \(k\) we can take to form different tests \(\varphi = \begin{cases} 1 & L(x) > k \\ \gamma & L(x) = k \\ 0 & L(x) < k \end{cases}\)

    \[ \left[0,\frac{1}{2}\right) , \left\{\frac{1}{2}\right\} , \left(\frac{1}{2},\frac{3}{2}\right) , \left\{\frac{3}{2}\right\} , \left(\frac{3}{2},\infty\right) . \]

    The corresponding significant levels are

    \[ \alpha = \mu_0(\varphi_{k,\gamma}) = \begin{cases} 1 & k \in [0,\frac{1}{2}) \\ 1\cdot \gamma + \frac{1}{2}\cdot (1-\gamma) & k = \frac{1}{2} \\ \frac{1}{2} & k \in (\frac{1}{2},\frac{3}{2}) \\ \frac{1}{2}\cdot \gamma + 0\cdot (1-\gamma) & k = \frac{3}{2} \\ 0 & k \in (\frac{3}{2},\infty) \end{cases}.\]

Common Tests in Practice

Below each test you will find Assumptions → Hypotheses → Statistic → Null distribution → Decision rule → Effect size & power tips → Typical use‑case. Use the boldface keywords to scan quickly.

Notation. \(\bar X=\tfrac1n\sum X_i\)

\(s^2\) is the sample variance

\(z_{1-\alpha}\) is the \((1-\alpha)\) quantile of \(N(0,1)\)

\(t_{\nu,1-\alpha}\) that of Student‑\(t\) with \(\nu\) degrees of freedom

\(F_{\nu_1,\nu_2;1-\alpha}\) that of the \(F\) distribution.

1 Tests for Means

1.1 One‑sample \(z\)‑test (known \(\sigma\))

  • Assumptions: i.i.d. normal or \(n\gtrsim30\) (CLT); population variance \(\sigma^2\) known.
  • Hypotheses: \(H_0\!:\mu=\mu_0\) vs \(H_1\!:\mu\ne\mu_0\) (two‑sided; adapt signs for one‑sided).
  • Statistic: \(Z=\frac{\bar X-\mu_0}{\sigma/\sqrt n}\).
  • Null dist.: \(Z\sim N(0,1)\) under \(H_0\).
  • Decision: two‑sided → reject if \(|Z|>z_{1-\alpha/2}\).
  • Effect size: Cohen’s \(d=(\bar X-\mu_0)/\sigma\).
  • Power: \(1-\beta=\Phi\bigl(-z_{1-\alpha/2}+|d|\sqrt n\bigr)\).
  • Use‑case: quality‑control when process s.d. is fixed by design.

1.2 One‑sample \(t\)‑test (unknown \(\sigma\))

  • Assumptions: i.i.d. (approx.) normal; \(n\ge 5\) works reasonably.
  • Statistic: \(T=\frac{\bar X-\mu_0}{s/\sqrt n}\) with \(s^2\) the unbiased sample variance.
  • Null dist.: \(T\sim t_{n-1}\).
  • Decision: reject \(|T|>t_{n-1,1-\alpha/2}\).
  • Effect size: Hedges’ \(g=(\bar X-\mu_0)/s\) (small‑sample bias‑corrected).
  • Power: non‑central \(t\) with \(n-1\) d.f. and nc‑parameter \(\delta=g\sqrt n\); tables/software.
  • Use‑case: laboratory measurements, A/B tests with small \(n\).

1.3 Paired \(t\)‑test

  • Transform: differences \(D_i=Y_i^{\text{before}}-Y_i^{\text{after}}\); apply the one‑sample \(t\) to \(D\).
  • Advantage: controls for subject‑level variability; more power than unpaired when correlation \(\rho>0\).

1.4 Two‑sample \(t\)‑tests

Variant Assumptions Statistic Null dist.
Pooled Normal, equal variances \(\sigma_1^2=\sigma_2^2\) \(T_\text{pool}=\frac{\bar X_1-\bar X_2}{s_p\sqrt{1/n_1+1/n_2}}\), \(\, s_p^2=\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}\) \(t_{n_1+n_2-2}\)
Welch Normal, variances may differ \(T_\text{Welch}=\frac{\bar X_1-\bar X_2}{\sqrt{s_1^2/n_1+s_2^2/n_2}}\) \(t_{\,\nu}\), \(\nu\) via Welch–Satterthwaite
  • Hypotheses: \(H_0\!:\mu_1=\mu_2\).
  • Effect size: Cohen’s \(d_{\text{unpooled}}=(\bar X_1-\bar X_2)/\sqrt{(s_1^2+s_2^2)/2}\).
  • Use‑case: A/B conversion, clinical trials, algorithm benchmarks.

2 Tests for Proportions

2.1 Wald test (single proportion)

  • Assumptions: \(np_0, n(1-p_0)\ge 10\).
  • Hypotheses: \(H_0\!:p=p_0\).
  • Statistic: \(Z=\frac{\hat p-p_0}{\sqrt{p_0(1-p_0)/n}}\).
  • Null dist.: \(N(0,1)\).
  • Decision: two‑sided \(|Z|>z_{1-\alpha/2}\).
  • Effect size: difference \(\hat p-p_0\) or odds‑ratio.
  • Caveat: for \(p\) near 0/1 or small \(n\), use exact binomial (Clopper–Pearson) or Wilson.

2.2 Two‑proportion \(z\)‑test (A/B)

  • Statistic: \(Z=\frac{\hat p_1-\hat p_2}{\sqrt{\hat p(1-\hat p)(1/n_1+1/n_2)}}\) where \(\hat p\) is pooled.
  • Null dist.: \(N(0,1)\).
  • Power: driven by absolute difference \(\Delta=|p_1-p_2|\); use normal approximation or simulation.

3 Variance and Scale Tests

3.1 \(\chi^2\) test for one variance

  • Assumptions: normality.
  • Hypotheses: \(H_0\!:\sigma^2=\sigma_0^2\).
  • Statistic: \(\chi^2=\frac{(n-1)s^2}{\sigma_0^2}\).
  • Null dist.: \(\chi^2_{n-1}\).
  • Decision: two‑sided → reject if \(\chi^2<\chi^2_{n-1,\alpha/2}\) or \(>\chi^2_{n-1,1-\alpha/2}\).
  • Use‑case: gauge R&R studies, measurement system variation.

3.2 \(F\)‑test for equality of two variances

  • Statistic: \(F=s_1^2/s_2^2\) with \(n_1-1, n_2-1\) d.f.
  • Decision: reject \(H_0\) if \(F\) outside \([F_{\alpha/2}, F_{1-\alpha/2}]\).
  • Alternatives: Levene’s or Brown–Forsythe tests when normality is doubtful.

4 Goodness‑of‑Fit and Independence

Test Data Hypotheses Statistic Null dist. Rule
\(\chi^2\) GOF categorical counts \((O_i)\) \(H_0\): observed counts follow specified probs \((E_i)\) \(\sum \frac{(O_i-E_i)^2}{E_i}\) \(\chi^2_{k-1}\) reject if statistic \(>\chi^2_{k-1,1-\alpha}\)
\(\chi^2\) independence \(r\times c\) table rows & cols independent same statistic \(\chi^2_{(r-1)(c-1)}\) similar rule
Kolmogorov–Smirnov continuous univariate \(H_0\)\(F(x)=F_0(x)\) \(D=\sup_x |F_n(x)-F_0(x)|\) KS distribution critical values tabulated; exact for small \(n\)

Power tips: merge sparse cells (\(E_i<5\)) or use exact multinomial Monte Carlo.


5 Non‑Parametric Rank Tests

Test Scenario Null Statistic Exact null?
Mann–Whitney \(U\) two independent samples, ordinal or continuous distributions equal \(U\) based on rank sums yes
Wilcoxon signed‑rank paired/one‑sample symmetric median \(0\) sum of signed ranks yes
Sign test paired, minimal assumption median \(0\) #positive differences binomial \((n,1/2)\)

Non‑parametric tests sacrifice some power (~95% efficient vs parametric) but require only continuity or symmetry.


Choosing the Right Test — Quick Checklist

  1. Data type: continuous, binary, counts, ranks?
  2. Assumptions: normality, equal variances, independence. Verify with QQ‑plots, Levene’s test, runs test.
  3. Design: paired vs unpaired, one‑sided vs two‑sided, balanced sample sizes.
  4. Effect size & power: decide minimally important difference before seeing data.
  5. Robustness: if assumptions fail, pivot to non‑parametrics or resampling (bootstrap, permutation).

Power Analysis

Before collecting data, compute required sample size \(n\) to achieve target power \(1-\beta\) for expected effect size \(\delta\) at level \(\alpha\).