Kernel Methods In Machine Learning

Author: Eiko

Time: 2025-02-17 10:58:26 - 2025-02-17 10:58:26 (UTC)

In this course, we talk about how to use RKHS, reporducing kernel Hilbert space, to do machine learning.

Write down a positive definite kernel, and these kernels will be served as building blocks.

Positive definite Kernels

Definition(Positive Definite Kernels). Let $X\neq\varnothing$ be any set, $k:X\times X\to \mathbb{R}$ is called a positive definite kernel if

$k$ is symmetric $k(x,y)=k(y,x)$
$k$ is positive semi-definite, which means, for any $n\in \mathbb{N}$, $x_1,\dots,x_n\in X$, $\alpha_i\in \mathbb{R}$, we have

\[\sum_{i,j} \alpha_i \alpha_j k(x_i,x_j)\geq 0\]

This is equivalent to say the matrix $(k(x_i,x_j))_{i,j}$ is positive semi-definite.

You can also think we have defined a symmetric positive definite bilinear form, an inner product on the vector space $\mathbb{R}^X$ with distinguished basis $X$.

Main Example: Square-Exponential

\[k(x,y) = \exp\left(-\frac{\|x-y\|^2}{2\sigma^2}\right)\]

is a positive definite kernel.

Remarks

Intuition: Solutions $k(x,y)$ encode the similarity between $x$ and $y$.
\[\int\int k(x,y)f(x)f(y)dxdy\ge 0.\]
$k(x,y)\ge 0$ does not mean $k$ is positive definite.

Further Examples Of Positive Definite Kernels

\[k(x,y) = \exp\left(\frac{|x-y|^p}{\sigma^p}\right)\]

It is positive definite for $p\in [1,2]$.

Polynomial kernels:

\[k(x,y) = (x^Ty+c)^m, \quad x,y\in \mathbb{R}^d, c>0, m\in \mathbb{N}.\]

Sums And Products

Given two positive definite kernels $k_1,k_2 : X\times X\to \mathbb{R}$, then also

\[\begin{align*} (k_1+k_2)(x,y) & := k_1(x,y)+k_2(x,y) \\ k_1k_2(x,y) & := k_1(x,y)\cdot k_2(x,y) \end{align*}\]

Scaling: $f:X\to \mathbb{R}$ and $k:X\times X\to \mathbb{R}$ positive definite, then

\[k_f(x,y) := f(x)f(y)k(x,y)\]

will also be positive definite.

Reproducing Kernel Hilbert Spaces (RKHS)

\[\mathcal{H}_K = \{ \text{limit of linear combinations of functions of the form $k(\cdot,x)$} \}\]

Construction

Step 1. Fix a positive definite kernel $k:X\times X\to \mathbb{R}$.

\[\begin{align*} \mathcal{H}_K^\circ &= \mathbb{R}\{ k(x, \cdot) : x\in X \} \\ &= \bigoplus_{x\in X} \mathbb{R}k(x,\cdot) \\ &= \left\{ f=\sum_{i=1}^n c_i k(x_i,\cdot) : n\in \mathbb{N}, c_i\in \mathbb{R}, x_i\in X \right\} \end{align*}\]

Step 2. Construct an inner product $\langle \cdot, \cdot \rangle_K$ on $\mathcal{H}_K$, demand that

\[f(x) = \langle f, k(x,\cdot) \rangle_K, \quad f\in \mathcal{H}_K^\circ.\]

This is called the reproducing property. The point of this property is that it makes computing an inner product on an infinite dimensional space very easy, just a number.

Reproducing property determines inner product on $\mathcal{H}_K^\circ$.

\[f = \sum a_i k(x_i, \cdot), \quad g = \sum b_j k(y_j, \cdot)\]

\[\langle f, g\rangle_{\mathcal{H}_K} = \sum_{i,j} a_i b_j k(x_i,y_j)\]

Define the RKHS norm as $\|f\|_{\mathcal{H}_K} = \sqrt{\langle f, f\rangle_{\mathcal{H}_K}}$.

Lemma. The pairing $\langle\cdot, \cdot\rangle_{\mathcal{H}_K}$ is indeed an inner product on $\mathcal{H}_K^\circ$.

\[\sum a_i a_j k(x_i, x_j)\ge 0, \quad \forall a_i\in \mathbb{R}, x_i\in X.\]

\[\langle f, f\rangle_{\mathcal{H}_K} = 0\Rightarrow f=0.\]

Proof. For $x\in X$,

\[\begin{align*} |f(x)|^2 &= |\langle f, k(x,\cdot)\rangle |^2 \\ &\le \|f\|_{\mathcal{H}_K}^2 \|k(x,\cdot)\|_{\mathcal{H}_K}^2 \\ \end{align*}\]

This means if $\|f\|_{\mathcal{H}_K}=0$, then $f(x)=0$ for all $x\in X$, RKHS norm controls function values.

Step 3. Taking Completion

Take $\mathcal{H}_K=\overline{\mathcal{H}_K^\circ}$, the completion of $\mathcal{H}_K^\circ$ with respect to the norm $\|\cdot\|_{\mathcal{H}_K}$.

Remark. The function space inherent properties from the kernels.

$k$ is bounded $\Leftrightarrow \forall f\in \mathcal{H}_K$, $\|f\|_{\mathcal{H}_K}<\infty$.
$k$ is bounded continuous $\Leftrightarrow$ $\|\cdot\|_{\mathcal{H}_K}$ is bounded continuous.

Gaussian RKHS For Different Widths

Let $\sigma_2 > \sigma_1 > 0$, consider the two spaces $\mathcal{H}_{K_{\sigma_2}}\subset \mathcal{H}_{K_{\sigma_1}}$.

Remark.

Computing RKHS norm for any given function is difficult because you need to write them as linear combinations of $k(x,\cdot)$.
This integral is not the RKHS norm, in general

\[\int \int f(x)k(x,y)f(y)\,\mathrm{d}x\,\mathrm{d}y \neq \|f\|_{\mathcal{H}_K}^2.\]

Function Evaluations, Interpretation Of RKHS Norm

If $f,g$ are close in $\mathcal{H}_K$, then they are close point-wise.

\[ |f(x)-g(x)|\le C_x \|f-g\|_{\mathcal{H}_K}.\]

In other words, $\delta_x : \mathcal{H}_K\to \mathbb{R}, f\mapsto f(x)$ is bounded continuous.

Let’s Look At A Counter Example

$\|f\|_{L^2(0,1)}^2 = \int_0^1 f(x)^2\,\mathrm{d}x$, this norm does not have the above property! Because you can move the function value at a local region without changing much of the norm. A practical example is to take $q_n = x^n$.
In any RKHS, function evaluations are continuous. Any Hilbert space with continuous function evaluations is an RKHS. For example Sobolev spaces where the derivatives are controlled by norm

\[ \|f\|_{W^1} = \int \left|f(x)\right|^2 + \left|f'(x)\right|^2 \,\mathrm{d}x\]

are RKHS.