Concept Of Hypothesis Testing

Author: Eiko

Time: 2024-12-28 15:59:49 - 2025-01-03 12:56:59 (UTC)

Reference:

Theoretical Statistics by Robert W. Keener
Probability Theory by Eiko
Foundations of Modern Probability by Olav Kallenberg

Let $θ \in Ω$ be some parameter space and $Ω = Ω_{0} \cup Ω_{1}$ be a partition of this space, $X \sim P (X | θ)$ be certain law. $H_{i}$ is a hypothesis that $θ \in Ω_{i}$ .

Hypothesis testing aims to tell which of the two competing hypotheses $H_{0}$ or $H_{1}$ is correct by observing $X$ .

Test Functions

Non-Randomized Tests

A non-randomized test of $H_{0}$ versus $H_{1}$ can be specified by a critical region $S$ , so if $X \in S$ we reject $H_{0}$ in favor of $H_{1}$ .
Power function $β_{S} : Ω \to R$ describes the probability of rejecting $H_{0}$ given $θ$

$β_{S} (θ) = P (X \in S | θ)$
Significant level $α$ is the small error calculating the worst error rate of falsely rejecting $H_{0}$ when it is true.

$α_{S} = sup_{θ \in Ω_{0}} β_{S} (θ) .$

In theory we would want $β_{S} (θ) = 1_{Ω_{1}}$ which would imply $α_{S} = 0$ , but this is not possible in practice.

Randomized Tests

Sometimes instead of giving a critical region $S$ or equivalently a function $1_{S}$ , we give a critical function $φ (x)$ instead, reflecting the probability of rejecting $H_{0}$ . Then a non-randomized test is just a special case of $φ = 1_{S}$ .

In this case, the power function is

$β_{φ} (θ) = E (φ (X) | θ)$

and the significant level is

$α_{φ} = sup_{θ \in Ω_{0}} β_{φ} (θ) = sup_{θ \in Ω_{0}} E (φ (X) | θ) .$

The main advantage of randomized tests is that they can form (convex) linear combinations.

Simple Hypothesis And Simple Tests

A hypothesis is simple if $Ω_{i}$ is a singleton.

Neyman-Pearson Lemma

Assume $H_{0}$ and $H_{1}$ are both simple, in this case there is a Neyman-Pearson Lemma describing all reasonable tests. Let $μ_{1} = P (X | θ_{1})$ and $μ_{0} = P (X | θ_{0})$ be the distributions of $X$ under $H_{1}$ and $H_{0}$ respectively.

We have

$α_{φ} = μ_{0} (φ) = \int φ (x) μ_{0} (d x)$ $β_{φ} (θ_{i}) = μ_{i} (φ) = \int φ (x) μ_{i} (d x) .$

We would want to choose $φ$ such that $μ_{0} (φ) \to 0$ and $μ_{1} (φ) \to 1$ . Consider maximizing $β_{φ} (θ_{1})$ subject to $α_{φ} \leq α$ .

Lagrange Multiplier Lemma

Let $k \geq 0$ be any constant, then maximizing $μ_{1} (φ) - k μ_{0} (φ)$ gives the function $φ^{*}$ maximizing $μ_{1} (φ)$ subject to $μ_{0} (φ) \leq α$ , here $α = μ_{0} (φ^{*})$ .

$\begin{aligned} φ^{*} & \in {argmax}_{φ} (μ_{1} (φ) - k μ_{0} (φ)) \\ \subset {argmax}_{μ_{0} (φ) \leq α} μ_{1} (φ) . \end{aligned}$
Moreover, any function $φ^{* *}$ maximizing $μ_{1} (φ)$ subject to $μ_{0} (φ) \leq α$ must have $μ_{0} (φ^{* *}) = α$ .

$φ^{* *} \in {argmax}_{μ_{0} (φ) \leq α} μ_{1} (φ) \subset {φ : μ_{0} (φ) = α} .$

Note that $φ^{*}$ and $α$ depend on $k$ here.

Proof.

Let $φ^{*}$ be the maximizing function. It suffices to prove that $μ_{0} (φ) \leq α \Rightarrow μ_{1} (φ) \leq μ_{1} (φ^{*})$ .

We have

$μ_{1} (φ) - k (μ_{0} (φ) - α) \leq μ_{1} (φ^{*}) - k (μ (φ^{*}) - α)$

So $μ_{0} (φ) - α \leq 0 \Rightarrow - k (μ_{0} (φ) - α) \geq 0$ therefore

$\begin{aligned} μ_{1} (φ) & \leq μ_{1} (φ) - k (μ_{0} (φ) - α) \\ \leq μ_{1} (φ^{*}) - k (μ_{0} (φ^{*}) - α) \\ = μ_{1} (φ^{*}) . \end{aligned}$
We know that $φ^{* *}$ and $φ^{*}$ are both in ${argmax}_{μ_{0} (φ) \leq α} μ_{1} (φ)$ . Therefore $μ_{1} (φ^{* *}) = μ_{1} (φ^{*})$ . The fact that $φ^{*} \in argmax (μ_{1} (φ) - k μ_{0} (φ))$ implies

$\begin{aligned} μ_{1} (φ^{*}) - k μ_{0} (φ^{*}) & \geq μ_{1} (φ^{* *}) - k μ_{0} (φ^{* *}) \\ = μ_{1} (φ^{*}) - k μ_{0} (φ^{* *}) . \end{aligned}$

Therefore $μ_{0} (φ^{* *}) \geq μ_{0} (φ^{*}) = α$ by definition.

How To Maximize $μ_{1} (φ) - k μ_{0} (φ)$ ?

We know that $μ_{1} - k μ_{0}$ is a finite signed measure, so according to Hahn decomposition, any finite signed measure can be uniquely decomposed into the difference of two mutually singular finite measures

$μ_{1} - k μ_{0} = ν_{+} - ν_{-} .$

So maximizing $μ_{1} (φ) - k μ_{0} (φ)$ is equivalent to maximizing $ν_{+} (φ) - ν_{-} (φ)$ . From which it is clear that we can pick $φ = 1_{A_{+}}$ where $A_{+}$ is the set where $ν_{+}$ is concentrated, and there is a freedom for us to pick anything from $[0, 1]$ on a set of measure zero in $| μ_{1} - k μ_{0} |$ .

If $μ_{i}$ can be written as density functions, then the set $A_{+}$ is simply ${x : \frac{d μ_{1}}{d μ} > k \frac{d μ_{0}}{d μ}}$ . This can be seen as a slight generalization of a likelihood ratio test, if we ignore the division by zero problem, it can be written as ${\frac{d μ_{1}}{d μ_{0}} > k}$ .

The Neyman-Pearson Lemma

The Lemma states that, for a simple test scenario, given any level $α \in [0, 1]$ , there exists a likelihood ratio test (which means $1_{L > k} \leq φ \leq 1_{L \geq k}$ and potentially some other function values on a measure zero set) $φ_{α}$ with exactly level $α$ (i.e. $μ_{0} (φ_{α}) = α$ ). The likelihood ratio test $φ_{α}$ is chosen to be maximizing $μ_{1} (φ) - k μ_{0} (φ)$ and any likelihood ratio test maximizes the power function $β_{φ_{α}} (θ_{1})$ subject to the significant level $α_{φ_{α}} \leq α$ .

Some Detailed Results Relating To Neyman-Pearson Lemma

For $α \in [0, 1]$ , let $k$ be a critical value for a likelihood ratio test $φ_{α}$ in the sense of Neyman-Pearson Lemma, i.e.

$φ_{α} = 1_{{x : \frac{d μ_{1} (x)}{d μ_{0} (x)} > k}} a.e. in | μ_{0} - k μ_{1} | .$

Then $μ_{0} (φ_{α}) = α$ and $μ_{1} (φ_{α}) = β_{φ_{α}} (θ_{1})$ .

We have

$φ^{* *} \in {argmax}_{μ_{0} \leq α} μ_{1} \Rightarrow φ^{* *} = φ_{α} a.e. in | μ_{0} - k μ_{1} | .$
If $μ_{0} \neq μ_{1}$ or $k \neq 1$ , with $φ_{α}$ a likelihood ratio test with level $α \in (0, 1)$ , then $μ_{1} (φ_{α}) > α$ .

$μ_{0} \neq μ_{1} \Rightarrow μ_{1} (φ_{α}) > α .$