Author: Eiko

Time: 2024-12-28 15:59:49 - 2025-01-03 12:56:59 (UTC)

Reference:

  • Theoretical Statistics by Robert W. Keener

  • Probability Theory by Eiko

  • Foundations of Modern Probability by Olav Kallenberg

Let θΩ be some parameter space and Ω=Ω0Ω1 be a partition of this space, XP(X|θ) be certain law. Hi is a hypothesis that θΩi.

Hypothesis testing aims to tell which of the two competing hypotheses H0 or H1 is correct by observing X.

Test Functions

Non-Randomized Tests

  • A non-randomized test of H0 versus H1 can be specified by a critical region S, so if XS we reject H0 in favor of H1.

  • Power function βS:ΩR describes the probability of rejecting H0 given θ

    βS(θ)=P(XS|θ)

  • Significant level α is the small error calculating the worst error rate of falsely rejecting H0 when it is true.

    αS=supθΩ0βS(θ).

    In theory we would want βS(θ)=1Ω1 which would imply αS=0, but this is not possible in practice.

Randomized Tests

Sometimes instead of giving a critical region S or equivalently a function 1S, we give a critical function φ(x) instead, reflecting the probability of rejecting H0. Then a non-randomized test is just a special case of φ=1S.

In this case, the power function is

βφ(θ)=E(φ(X)|θ)

and the significant level is

αφ=supθΩ0βφ(θ)=supθΩ0E(φ(X)|θ).

The main advantage of randomized tests is that they can form (convex) linear combinations.

Simple Hypothesis And Simple Tests

A hypothesis is simple if Ωi is a singleton.

Neyman-Pearson Lemma

Assume H0 and H1 are both simple, in this case there is a Neyman-Pearson Lemma describing all reasonable tests. Let μ1=P(X|θ1) and μ0=P(X|θ0) be the distributions of X under H1 and H0 respectively.

We have

αφ=μ0(φ)=φ(x)μ0(dx) βφ(θi)=μi(φ)=φ(x)μi(dx).

We would want to choose φ such that μ0(φ)0 and μ1(φ)1. Consider maximizing βφ(θ1) subject to αφα.

Lagrange Multiplier Lemma

  • Let k0 be any constant, then maximizing μ1(φ)kμ0(φ) gives the function φ maximizing μ1(φ) subject to μ0(φ)α, here α=μ0(φ).

    φargmaxφ(μ1(φ)kμ0(φ))argmaxμ0(φ)αμ1(φ).

  • Moreover, any function φ maximizing μ1(φ) subject to μ0(φ)α must have μ0(φ)=α.

    φargmaxμ0(φ)αμ1(φ){φ:μ0(φ)=α}.

Note that φ and α depend on k here.

Proof.

  • Let φ be the maximizing function. It suffices to prove that μ0(φ)αμ1(φ)μ1(φ).

    We have

    μ1(φ)k(μ0(φ)α)μ1(φ)k(μ(φ)α)

    So μ0(φ)α0k(μ0(φ)α)0 therefore

    μ1(φ)μ1(φ)k(μ0(φ)α)μ1(φ)k(μ0(φ)α)=μ1(φ).

  • We know that φ and φ are both in argmaxμ0(φ)αμ1(φ). Therefore μ1(φ)=μ1(φ). The fact that φargmax(μ1(φ)kμ0(φ)) implies

    μ1(φ)kμ0(φ)μ1(φ)kμ0(φ)=μ1(φ)kμ0(φ).

    Therefore μ0(φ)μ0(φ)=α by definition.

How To Maximize μ1(φ)kμ0(φ) ?

We know that μ1kμ0 is a finite signed measure, so according to Hahn decomposition, any finite signed measure can be uniquely decomposed into the difference of two mutually singular finite measures

μ1kμ0=ν+ν.

So maximizing μ1(φ)kμ0(φ) is equivalent to maximizing ν+(φ)ν(φ). From which it is clear that we can pick φ=1A+ where A+ is the set where ν+ is concentrated, and there is a freedom for us to pick anything from [0,1] on a set of measure zero in |μ1kμ0|.

If μi can be written as density functions, then the set A+ is simply {x:dμ1dμ>kdμ0dμ}. This can be seen as a slight generalization of a likelihood ratio test, if we ignore the division by zero problem, it can be written as {dμ1dμ0>k}.

The Neyman-Pearson Lemma

The Lemma states that, for a simple test scenario, given any level α[0,1], there exists a likelihood ratio test (which means 1L>kφ1Lk and potentially some other function values on a measure zero set) φα with exactly level α (i.e. μ0(φα)=α). The likelihood ratio test φα is chosen to be maximizing μ1(φ)kμ0(φ) and any likelihood ratio test maximizes the power function βφα(θ1) subject to the significant level αφαα.

Some Detailed Results Relating To Neyman-Pearson Lemma

  • For α[0,1], let k be a critical value for a likelihood ratio test φα in the sense of Neyman-Pearson Lemma, i.e.

    φα=1{x:dμ1(x)dμ0(x)>k} a.e. in |μ0kμ1|.

    Then μ0(φα)=α and μ1(φα)=βφα(θ1).

    We have

    φargmaxμ0αμ1φ=φα a.e. in |μ0kμ1|.

  • If μ0μ1 or k1, with φα a likelihood ratio test with level α(0,1), then μ1(φα)>α.

μ0μ1μ1(φα)>α.

Proof.

  • We already proved that μ0(φ)=μ0(φα)=α. Since

    (μ1kμ0)(φ)=(μ1kμ0)(φα)

    and by the construction of φα, we know that φαφ0 a.e. in |μ0kμ1|. This implies φ=φα a.e. in |μ0kμ1|.

  • Consider the constant test φc=α(0,1), by φαargmaxμ0αμ1 we know μ1(φα)μ1(φc)=α. If equality holds then φc is also in the set, thus φc=φα a.e. in |μ0kμ1|, but this equality never hold since φα{0,1} a.e. in |μ0kμ1|. The only possible case is μ0=μ1 and k=1.

Examples

  • Suppose we are testing

    P(X|θ)Exponential(θ)θeθx1x0dx with hypothesis H0:θ=θ0 and H1:θ=θ1, for simplicity assume θ1>θ0. The likelihood ratio test is of the form

    θ1eθ1xθ0eθ0x>kx<1θ1θ0logθ1kθ0=xk.

    α=μ0φ=0xkθ0eθ0xdx=1eθ0xk.

    xk=1θ0log11α.

    And the test with level α is simply given by φα=1x<1θ0log11α. Some magic is happening here, this test is optimal as it maximizes μ1(φ) among level α, but is independent of θ1! (This is an example of Uniformly Most Powerful Test. An interesting question is when does this happen?)

  • Consider a very simple random variable XBernoulli(p), with H0:p=12 and H1:p=14. The likelihood ratio is

    L(x)={12x=132x=0.

    Then clearly there are 5 different regions of k we can take to form different tests φ={1L(x)>kγL(x)=k0L(x)<k

    [0,12),{12},(12,32),{32},(32,).

    The corresponding significant levels are

    α=μ0(φk,γ)={1k[0,12)1γ+12(1γ)k=1212k(12,32)12γ+0(1γ)k=320k(32,).