Author: Eiko

Time: 2024-12-28 15:59:49 - 2025-01-03 12:56:59 (UTC)

Reference:

  • Theoretical Statistics by Robert W. Keener

  • Probability Theory by Eiko

  • Foundations of Modern Probability by Olav Kallenberg

Let \(\theta\in \Omega\) be some parameter space and \(\Omega = \Omega_0\cup \Omega_1\) be a partition of this space, \(X\sim \mathbb{P}(X|\theta)\) be certain law. \(H_i\) is a hypothesis that \(\theta\in \Omega_i\).

Hypothesis testing aims to tell which of the two competing hypotheses \(H_0\) or \(H_1\) is correct by observing \(X\).

Test Functions

Non-Randomized Tests

  • A non-randomized test of \(H_0\) versus \(H_1\) can be specified by a critical region \(S\), so if \(X\in S\) we reject \(H_0\) in favor of \(H_1\).

  • Power function \(\beta_S:\Omega\to \mathbb{R}\) describes the probability of rejecting \(H_0\) given \(\theta\)

    \[\beta_S(\theta) = \mathbb{P}(X\in S|\theta)\]

  • Significant level \(\alpha\) is the small error calculating the worst error rate of falsely rejecting \(H_0\) when it is true.

    \[\alpha_S = \sup_{\theta\in \Omega_0} \beta_S(\theta).\]

    In theory we would want \(\beta_S(\theta) = 1_{\Omega_1}\) which would imply \(\alpha_S = 0\), but this is not possible in practice.

Randomized Tests

Sometimes instead of giving a critical region \(S\) or equivalently a function \(1_S\), we give a critical function \(\varphi(x)\) instead, reflecting the probability of rejecting \(H_0\). Then a non-randomized test is just a special case of \(\varphi = 1_S\).

In this case, the power function is

\[ \beta_\varphi(\theta) = \mathbb{E}(\varphi(X)|\theta) \]

and the significant level is

\[ \alpha_\varphi = \sup_{\theta\in \Omega_0} \beta_\varphi(\theta) = \sup_{\theta\in \Omega_0} \mathbb{E}(\varphi(X)|\theta). \]

The main advantage of randomized tests is that they can form (convex) linear combinations.

Simple Hypothesis And Simple Tests

A hypothesis is simple if \(\Omega_i\) is a singleton.

Neyman-Pearson Lemma

Assume \(H_0\) and \(H_1\) are both simple, in this case there is a Neyman-Pearson Lemma describing all reasonable tests. Let \(\mu_1 = \mathbb{P}(X|\theta_1)\) and \(\mu_0 = \mathbb{P}(X|\theta_0)\) be the distributions of \(X\) under \(H_1\) and \(H_0\) respectively.

We have

\[\alpha_\varphi = \mu_0(\varphi) = \int \varphi(x) \mu_0(dx)\] \[\beta_\varphi(\theta_i) = \mu_i(\varphi) = \int \varphi(x) \mu_i(dx).\]

We would want to choose \(\varphi\) such that \(\mu_0(\varphi)\to 0\) and \(\mu_1(\varphi)\to 1\). Consider maximizing \(\beta_\varphi(\theta_1)\) subject to \(\alpha_\varphi\le \alpha\).

Lagrange Multiplier Lemma

  • Let \(k\ge 0\) be any constant, then maximizing \(\mu_1(\varphi) - k\mu_0(\varphi)\) gives the function \(\varphi^*\) maximizing \(\mu_1(\varphi)\) subject to \(\mu_0(\varphi)\le \alpha\), here \(\alpha = \mu_0(\varphi^*)\).

    \[\begin{align*} \varphi^* &\in \mathrm{argmax}_\varphi \left(\mu_1(\varphi) - k \mu_0(\varphi) \right) \\ &\subset \mathrm{argmax}_{\mu_0(\varphi)\le \alpha} \mu_1(\varphi). \end{align*}\]

  • Moreover, any function \(\varphi^{**}\) maximizing \(\mu_1(\varphi)\) subject to \(\mu_0(\varphi)\le \alpha\) must have \(\mu_0(\varphi^{**}) = \alpha\).

    \[ \varphi^{**}\in \mathrm{argmax}_{\mu_0(\varphi)\le \alpha} \mu_1(\varphi) \subset \{ \varphi: \mu_0(\varphi) = \alpha \}.\]

Note that \(\varphi^*\) and \(\alpha\) depend on \(k\) here.

Proof.

  • Let \(\varphi^*\) be the maximizing function. It suffices to prove that \(\mu_0(\varphi)\le \alpha \Rightarrow \mu_1(\varphi)\le \mu_1(\varphi^*)\).

    We have

    \[ \mu_1(\varphi) - k(\mu_0(\varphi) - \alpha) \le \mu_1(\varphi^*) - k(\mu(\varphi^*) -\alpha)\]

    So \(\mu_0(\varphi) - \alpha \le 0 \Rightarrow -k(\mu_0(\varphi) - \alpha)\ge 0\) therefore

    \[\begin{align*} \mu_1(\varphi) &\le \mu_1(\varphi) - k(\mu_0(\varphi) - \alpha) \\ &\le \mu_1(\varphi^*) - k(\mu_0(\varphi^*) -\alpha) \\ &= \mu_1(\varphi^*). \end{align*}\]

  • We know that \(\varphi^{**}\) and \(\varphi^*\) are both in \(\mathrm{argmax}_{\mu_0(\varphi)\le \alpha} \mu_1(\varphi)\). Therefore \(\mu_1(\varphi^{**})=\mu_1(\varphi^*)\). The fact that \(\varphi^*\in \mathrm{argmax}(\mu_1(\varphi) - k\mu_0(\varphi))\) implies

    \[\begin{align*} \mu_1(\varphi^*) - k\mu_0(\varphi^*) &\ge \mu_1(\varphi^{**}) - k\mu_0(\varphi^{**}) \\ &= \mu_1(\varphi^*) - k\mu_0(\varphi^{**}). \end{align*}\]

    Therefore \(\mu_0(\varphi^{**}) \ge \mu_0(\varphi^*) = \alpha\) by definition.

How To Maximize \(\mu_1(\varphi)-k\mu_0(\varphi)\) ?

We know that \(\mu_1 - k\mu_0\) is a finite signed measure, so according to Hahn decomposition, any finite signed measure can be uniquely decomposed into the difference of two mutually singular finite measures

\[ \mu_1 - k \mu_0 = \nu_+ - \nu_- .\]

So maximizing \(\mu_1(\varphi) - k\mu_0(\varphi)\) is equivalent to maximizing \(\nu_+(\varphi) - \nu_-(\varphi)\). From which it is clear that we can pick \(\varphi = 1_{A_+}\) where \(A_+\) is the set where \(\nu_+\) is concentrated, and there is a freedom for us to pick anything from \([0,1]\) on a set of measure zero in \(|\mu_1 - k\mu_0|\).

If \(\mu_i\) can be written as density functions, then the set \(A_+\) is simply \(\left\{x: \frac{\mathrm{d} \mu_1}{\mathrm{d} \mu} > k \frac{\mathrm{d} \mu_0}{\mathrm{d} \mu}\right\}\). This can be seen as a slight generalization of a likelihood ratio test, if we ignore the division by zero problem, it can be written as \(\left\{\frac{\mathrm{d} \mu_1}{\mathrm{d} \mu_0} > k\right\}\).

The Neyman-Pearson Lemma

The Lemma states that, for a simple test scenario, given any level \(\alpha\in [0,1]\), there exists a likelihood ratio test (which means \(1_{L>k}\le \varphi\le 1_{L\ge k}\) and potentially some other function values on a measure zero set) \(\varphi_\alpha\) with exactly level \(\alpha\) (i.e. \(\mu_0(\varphi_\alpha)=\alpha\)). The likelihood ratio test \(\varphi_\alpha\) is chosen to be maximizing \(\mu_1(\varphi) - k\mu_0(\varphi)\) and any likelihood ratio test maximizes the power function \(\beta_{\varphi_\alpha}(\theta_1)\) subject to the significant level \(\alpha_{\varphi_\alpha} \le \alpha\).

Some Detailed Results Relating To Neyman-Pearson Lemma

  • For \(\alpha\in [0,1]\), let \(k\) be a critical value for a likelihood ratio test \(\varphi_\alpha\) in the sense of Neyman-Pearson Lemma, i.e.

    \[\varphi_\alpha = 1_{\left\{x:\frac{\mathrm{d} \mu_1(x)}{\mathrm{d} \mu_0(x)} > k\right\}} \text{ a.e. in } |\mu_0 - k\mu_1|.\]

    Then \(\mu_0(\varphi_\alpha) = \alpha\) and \(\mu_1(\varphi_\alpha) = \beta_{\varphi_\alpha}(\theta_1)\).

    We have

    \[\varphi^{**}\in \mathrm{argmax}_{\mu_0\le \alpha}\mu_1 \Rightarrow \varphi^{**}=\varphi_\alpha \text{ a.e. in } |\mu_0 - k\mu_1|.\]

  • If \(\mu_0\neq \mu_1\) or \(k\neq 1\), with \(\varphi_\alpha\) a likelihood ratio test with level \(\alpha\in (0,1)\), then \(\mu_1(\varphi_\alpha) > \alpha\).

\[\mu_0\neq \mu_1\Rightarrow \mu_1(\varphi_\alpha)>\alpha. \]

Proof.

  • We already proved that \(\mu_0(\varphi^{**})=\mu_0(\varphi_\alpha) = \alpha\). Since

    \[ (\mu_1 - k\mu_0)(\varphi^{**}) = (\mu_1 - k\mu_0)(\varphi_\alpha)\]

    and by the construction of \(\varphi_\alpha\), we know that \(\varphi_\alpha - \varphi^{**}\ge 0\) a.e. in \(|\mu_0 - k\mu_1|\). This implies \(\varphi^{**}= \varphi_\alpha\) a.e. in \(|\mu_0 - k\mu_1|\).

  • Consider the constant test \(\varphi_c = \alpha\in (0,1)\), by \(\varphi_\alpha\in \mathrm{argmax}_{\mu_0\le\alpha}\mu_1\) we know \(\mu_1(\varphi_\alpha)\ge \mu_1(\varphi_c) = \alpha\). If equality holds then \(\varphi_c\) is also in the set, thus \(\varphi_c = \varphi_\alpha\) a.e. in \(|\mu_0 - k\mu_1|\), but this equality never hold since \(\varphi_\alpha\in \{0,1\}\) a.e. in \(|\mu_0 - k\mu_1|\). The only possible case is \(\mu_0=\mu_1\) and \(k=1\).

Examples

  • Suppose we are testing

    \[\mathbb{P}(X|\theta) \sim \text{Exponential}(\theta) \sim \theta e^{-\theta x}1_{x\ge 0}\,\mathrm{d}{x}\] with hypothesis \(H_0: \theta = \theta_0\) and \(H_1:\theta=\theta_1\), for simplicity assume \(\theta_1>\theta_0\). The likelihood ratio test is of the form

    \[ \frac{\theta_1e^{-\theta_1x}}{\theta_0e^{-\theta_0x}} > k \Leftrightarrow x < \frac{1}{\theta_1-\theta_0}\log\frac{\theta_1}{k\theta_0} = x_k.\]

    \[\alpha = \mu_0\varphi = \int_0^{x_k} \theta_0e^{-\theta_0x}\,\mathrm{d}{x} = 1 - e^{-\theta_0x_k}.\]

    \[ x_k = \frac{1}{\theta_0}\log\frac{1}{1-\alpha}.\]

    And the test with level \(\alpha\) is simply given by \(\varphi_\alpha = 1_{x<\frac{1}{\theta_0}\log \frac{1}{1-\alpha}}\). Some magic is happening here, this test is optimal as it maximizes \(\mu_1(\varphi)\) among level \(\le \alpha\), but is independent of \(\theta_1\)! (This is an example of Uniformly Most Powerful Test. An interesting question is when does this happen?)

  • Consider a very simple random variable \(X\sim \text{Bernoulli}(p)\), with \(H_0: p=\frac{1}{2}\) and \(H_1: p=\frac{1}{4}\). The likelihood ratio is

    \[ L(x) = \begin{cases} \frac{1}{2} & x=1 \\ \frac{3}{2} & x=0. \end{cases}\]

    Then clearly there are \(5\) different regions of \(k\) we can take to form different tests \(\varphi = \begin{cases} 1 & L(x) > k \\ \gamma & L(x) = k \\ 0 & L(x) < k \end{cases}\)

    \[ \left[0,\frac{1}{2}\right) , \left\{\frac{1}{2}\right\} , \left(\frac{1}{2},\frac{3}{2}\right) , \left\{\frac{3}{2}\right\} , \left(\frac{3}{2},\infty\right) . \]

    The corresponding significant levels are

    \[ \alpha = \mu_0(\varphi_{k,\gamma}) = \begin{cases} 1 & k \in [0,\frac{1}{2}) \\ 1\cdot \gamma + \frac{1}{2}\cdot (1-\gamma) & k = \frac{1}{2} \\ \frac{1}{2} & k \in (\frac{1}{2},\frac{3}{2}) \\ \frac{1}{2}\cdot \gamma + 0\cdot (1-\gamma) & k = \frac{3}{2} \\ 0 & k \in (\frac{3}{2},\infty) \end{cases}.\]