Kernel Methods In Machine Learning

Author: Eiko

Time: 2025-02-24 11:01:21 - 2025-02-24 11:01:21 (UTC)

Equivalent Definitions of RKHS

Given $k : X \times X \to R$ positive definite,

$H_{K} = \overset{―}{span {k (x, \cdot) : x \in X}}$
$H_{K}$ is given by the following two properties
1. $k (x, \cdot) \in H_{K}$
2. $⟨ f, k (x, \cdot) ⟩_{H_{K}} = f (x)$ for all $f \in H_{K}, x \in X$ .
An RKHS is a Hilbert space of functions on $X$ , $H_{K} = {f : X \to R}$ such that all evaluation maps $δ_{x} : H_{K} \to R$ at points are (bounded) continuous.

Relation With Sobolev Spaces

Let

$H^{s} (R^{d}) = {f \in L^{2} (R^{d}) : ∥ f ∥_{H^{s}} < \infty}$

where

$∥ f ∥_{H^{s}}^{2} = \sum_{| α | \leq s} \int_{R^{d}} | \partial^{α} f (x) |^{2} d x$

Remarks

In one dimension, $H^{1}$ already has continuous point evaluations, making it an RKHS. But in higher dimensions, controlling only this first order might not be enough anymore.

$H^{s} (R^{d})$ is an RKHS iff $s > \frac{d}{2}$ (ref: Sobolev embedding theorem).
If we define some general version of derivatives (weak derivatives etc), then all RKHS can be viewed as some Sobolev space.

Kernel Ridge Regression

Given some data as a finite subset of $X \times R$ , we want to find a function $f : X \to R$ fitting the data $f (x_{i}) \sim y_{i}$ in the space $H_{K}$ .

Problem. Fix $k : X^{2} \to R$ , minimize $E (f (x_{i}) - y_{i})^{2} + λ ∥ f ∥_{H_{K}}^{2}, f \in H_{K},$

where $λ \geq 0$ is the regularization parameter.

Representer Theorem

If $f^{*}$ minimizes the above problem for $λ > 0$ , then $f^{*}$ must be a finite linear combination of the kernel functions $k (x_{i}, \cdot)$ for $x_{i}$ in the data set.

To prove this theorem we use a orthogonal decomposition $f = g + h \in U_{D} \oplus U_{D}^{⊥}$ where $U_{D}$ is the finite dimensional space of functions in spanned by the data set, if we put in the minimization goal, we get

$E (g (x_{i}) + h (x_{i}) - y_{i})^{2} + λ ∥ g ∥_{H_{K}}^{2} + λ ∥ h ∥_{H_{K}}^{2} = E (g (x_{i}) - y_{i})^{2} + λ (∥ g ∥_{H_{K}}^{2} + ∥ h ∥_{H_{K}}^{2})$

We see that minimizing it yields $h = 0$ (unless $λ = 0$ ).

Now we are trying to minimize

$S = E (⟨ f, e_{i} ⟩ - y_{i})^{2} + λ ⟨ f, f ⟩$

If we differentiate $S$ with respect to $f = \sum f_{i} e_{i}$ ,

$\begin{aligned} D S & = E 2 (⟨ f, e_{i} ⟩ - y_{i}) ⟨ D f, e_{i} ⟩ + 2 λ ⟨ D f, f ⟩ \\ = 2 ⟨ D f, E (⟨ f, e_{i} ⟩ - y_{i}) e_{i} + λ f ⟩ \\ = 2 ⟨ D f, E (⟨ f, e_{i} ⟩ + λ f_{i} - y_{i}) e_{i} ⟩ \end{aligned}$

and set it to zero for all $D f$ , we get

$⟨ f, e_{i} ⟩ + λ f_{i} - y_{i} = 0$

In terms of matrices this is

$(K + λ I) \cdot \underset{―}{f} = \underset{―}{y}$

where $K_{i j} = k (x_{i}, x_{j})$ .