Author: Eiko

Time: 2025-02-24 11:01:21 - 2025-02-24 11:01:21 (UTC)

Equivalent Definitions of RKHS

  1. Given k:X×XR positive definite,

    HK=span{k(x,):xX}

  2. HK is given by the following two properties

    1. k(x,)HK

    2. f,k(x,)HK=f(x) for all fHK,xX.

  3. An RKHS is a Hilbert space of functions on X, HK={f:XR} such that all evaluation maps δx:HKR at points are (bounded) continuous.

Relation With Sobolev Spaces

Let

Hs(Rd)={fL2(Rd):fHs<}

where

fHs2=|α|sRd|αf(x)|2dx

Remarks

  • In one dimension, H1 already has continuous point evaluations, making it an RKHS. But in higher dimensions, controlling only this first order might not be enough anymore.

    Hs(Rd) is an RKHS iff s>d2 (ref: Sobolev embedding theorem).

  • If we define some general version of derivatives (weak derivatives etc), then all RKHS can be viewed as some Sobolev space.

Kernel Ridge Regression

Given some data as a finite subset of X×R, we want to find a function f:XR fitting the data f(xi)yi in the space HK.

Problem. Fix k:X2R, minimize E(f(xi)yi)2+λfHK2,fHK,

where λ0 is the regularization parameter.

Representer Theorem

If f minimizes the above problem for λ>0, then f must be a finite linear combination of the kernel functions k(xi,) for xi in the data set.

To prove this theorem we use a orthogonal decomposition f=g+hUDUD where UD is the finite dimensional space of functions in spanned by the data set, if we put in the minimization goal, we get

E(g(xi)+h(xi)yi)2+λgHK2+λhHK2=E(g(xi)yi)2+λ(gHK2+hHK2)

We see that minimizing it yields h=0 (unless λ=0).

Now we are trying to minimize

S=E(f,eiyi)2+λf,f

If we differentiate S with respect to f=fiei,

DS=E2(f,eiyi)Df,ei+2λDf,f=2Df,E(f,eiyi)ei+λf=2Df,E(f,ei+λfiyi)ei

and set it to zero for all Df, we get

f,ei+λfiyi=0

In terms of matrices this is

(K+λI)f=y

where Kij=k(xi,xj).