Author: Eiko

Time: 2025-02-17 10:58:26 - 2025-02-17 10:58:26 (UTC)

In this course, we talk about how to use RKHS, reporducing kernel Hilbert space, to do machine learning.

Write down a positive definite kernel, and these kernels will be served as building blocks.

Positive definite Kernels

Definition(Positive Definite Kernels). Let \(X\neq\varnothing\) be any set, \(k:X\times X\to \mathbb{R}\) is called a positive definite kernel if

  • \(k\) is symmetric \(k(x,y)=k(y,x)\)

  • \(k\) is positive semi-definite, which means, for any \(n\in \mathbb{N}\), \(x_1,\dots,x_n\in X\), \(\alpha_i\in \mathbb{R}\), we have

    \[\sum_{i,j} \alpha_i \alpha_j k(x_i,x_j)\geq 0\]

    This is equivalent to say the matrix \((k(x_i,x_j))_{i,j}\) is positive semi-definite.

You can also think we have defined a symmetric positive definite bilinear form, an inner product on the vector space \(\mathbb{R}^X\) with distinguished basis \(X\).

Main Example: Square-Exponential

\[k(x,y) = \exp\left(-\frac{\|x-y\|^2}{2\sigma^2}\right)\]

is a positive definite kernel.

Remarks

  • Intuition: Solutions \(k(x,y)\) encode the similarity between \(x\) and \(y\).

  • \[\int\int k(x,y)f(x)f(y)dxdy\ge 0.\]

  • \(k(x,y)\ge 0\) does not mean \(k\) is positive definite.

Further Examples Of Positive Definite Kernels

\[k(x,y) = \exp\left(\frac{|x-y|^p}{\sigma^p}\right)\]

It is positive definite for \(p\in [1,2]\).

Polynomial kernels:

\[k(x,y) = (x^Ty+c)^m, \quad x,y\in \mathbb{R}^d, c>0, m\in \mathbb{N}.\]

Sums And Products

Given two positive definite kernels \(k_1,k_2 : X\times X\to \mathbb{R}\), then also

\[\begin{align*} (k_1+k_2)(x,y) & := k_1(x,y)+k_2(x,y) \\ k_1k_2(x,y) & := k_1(x,y)\cdot k_2(x,y) \end{align*}\]

Scaling: \(f:X\to \mathbb{R}\) and \(k:X\times X\to \mathbb{R}\) positive definite, then

\[k_f(x,y) := f(x)f(y)k(x,y)\]

will also be positive definite.

Reproducing Kernel Hilbert Spaces (RKHS)

\[\mathcal{H}_K = \{ \text{limit of linear combinations of functions of the form $k(\cdot,x)$} \}\]

Construction

Step 1. Fix a positive definite kernel \(k:X\times X\to \mathbb{R}\).

\[\begin{align*} \mathcal{H}_K^\circ &= \mathbb{R}\{ k(x, \cdot) : x\in X \} \\ &= \bigoplus_{x\in X} \mathbb{R}k(x,\cdot) \\ &= \left\{ f=\sum_{i=1}^n c_i k(x_i,\cdot) : n\in \mathbb{N}, c_i\in \mathbb{R}, x_i\in X \right\} \end{align*}\]

Step 2. Construct an inner product \(\langle \cdot, \cdot \rangle_K\) on \(\mathcal{H}_K\), demand that

\[f(x) = \langle f, k(x,\cdot) \rangle_K, \quad f\in \mathcal{H}_K^\circ.\]

This is called the reproducing property. The point of this property is that it makes computing an inner product on an infinite dimensional space very easy, just a number.

Reproducing property determines inner product on \(\mathcal{H}_K^\circ\).

\[f = \sum a_i k(x_i, \cdot), \quad g = \sum b_j k(y_j, \cdot)\]

\[\langle f, g\rangle_{\mathcal{H}_K} = \sum_{i,j} a_i b_j k(x_i,y_j)\]

Define the RKHS norm as \(\|f\|_{\mathcal{H}_K} = \sqrt{\langle f, f\rangle_{\mathcal{H}_K}}\).

Lemma. The pairing \(\langle\cdot, \cdot\rangle_{\mathcal{H}_K}\) is indeed an inner product on \(\mathcal{H}_K^\circ\).

\[\sum a_i a_j k(x_i, x_j)\ge 0, \quad \forall a_i\in \mathbb{R}, x_i\in X.\]

\[\langle f, f\rangle_{\mathcal{H}_K} = 0\Rightarrow f=0.\]

Proof. For \(x\in X\),

\[\begin{align*} |f(x)|^2 &= |\langle f, k(x,\cdot)\rangle |^2 \\ &\le \|f\|_{\mathcal{H}_K}^2 \|k(x,\cdot)\|_{\mathcal{H}_K}^2 \\ \end{align*}\]

This means if \(\|f\|_{\mathcal{H}_K}=0\), then \(f(x)=0\) for all \(x\in X\), RKHS norm controls function values.

Step 3. Taking Completion

Take \(\mathcal{H}_K=\overline{\mathcal{H}_K^\circ}\), the completion of \(\mathcal{H}_K^\circ\) with respect to the norm \(\|\cdot\|_{\mathcal{H}_K}\).

Remark. The function space inherent properties from the kernels.

  • \(k\) is bounded \(\Leftrightarrow \forall f\in \mathcal{H}_K\), \(\|f\|_{\mathcal{H}_K}<\infty\).

  • \(k\) is bounded continuous \(\Leftrightarrow\) \(\|\cdot\|_{\mathcal{H}_K}\) is bounded continuous.

Gaussian RKHS For Different Widths

Let \(\sigma_2 > \sigma_1 > 0\), consider the two spaces \(\mathcal{H}_{K_{\sigma_2}}\subset \mathcal{H}_{K_{\sigma_1}}\).

Remark.

  • Computing RKHS norm for any given function is difficult because you need to write them as linear combinations of \(k(x,\cdot)\).

  • This integral is not the RKHS norm, in general

    \[\int \int f(x)k(x,y)f(y)\,\mathrm{d}x\,\mathrm{d}y \neq \|f\|_{\mathcal{H}_K}^2.\]

Function Evaluations, Interpretation Of RKHS Norm

If \(f,g\) are close in \(\mathcal{H}_K\), then they are close point-wise.

\[ |f(x)-g(x)|\le C_x \|f-g\|_{\mathcal{H}_K}.\]

In other words, \(\delta_x : \mathcal{H}_K\to \mathbb{R}, f\mapsto f(x)\) is bounded continuous.

Let’s Look At A Counter Example

  • \(\|f\|_{L^2(0,1)}^2 = \int_0^1 f(x)^2\,\mathrm{d}x\), this norm does not have the above property! Because you can move the function value at a local region without changing much of the norm. A practical example is to take \(q_n = x^n\).

  • In any RKHS, function evaluations are continuous. Any Hilbert space with continuous function evaluations is an RKHS. For example Sobolev spaces where the derivatives are controlled by norm

    \[ \|f\|_{W^1} = \int \left|f(x)\right|^2 + \left|f'(x)\right|^2 \,\mathrm{d}x\]

    are RKHS.