In this course, we talk about how to use RKHS, reporducing kernel Hilbert space, to do machine learning.
Write down a positive definite kernel, and these kernels will be served as building blocks.
Definition(Positive Definite Kernels). Let \(X\neq\varnothing\) be any set, \(k:X\times X\to \mathbb{R}\) is called a positive definite kernel if
\(k\) is symmetric \(k(x,y)=k(y,x)\)
\(k\) is positive semi-definite, which means, for any \(n\in \mathbb{N}\), \(x_1,\dots,x_n\in X\), \(\alpha_i\in \mathbb{R}\), we have
\[\sum_{i,j} \alpha_i \alpha_j k(x_i,x_j)\geq 0\]
This is equivalent to say the matrix \((k(x_i,x_j))_{i,j}\) is positive semi-definite.
You can also think we have defined a symmetric positive definite bilinear form, an inner product on the vector space \(\mathbb{R}^X\) with distinguished basis \(X\).
\[k(x,y) = \exp\left(-\frac{\|x-y\|^2}{2\sigma^2}\right)\]
is a positive definite kernel.
Remarks
Intuition: Solutions \(k(x,y)\) encode the similarity between \(x\) and \(y\).
\[\int\int k(x,y)f(x)f(y)dxdy\ge 0.\]
\(k(x,y)\ge 0\) does not mean \(k\) is positive definite.
\[k(x,y) = \exp\left(\frac{|x-y|^p}{\sigma^p}\right)\]
It is positive definite for \(p\in [1,2]\).
Polynomial kernels:
\[k(x,y) = (x^Ty+c)^m, \quad x,y\in \mathbb{R}^d, c>0, m\in \mathbb{N}.\]
Given two positive definite kernels \(k_1,k_2 : X\times X\to \mathbb{R}\), then also
\[\begin{align*} (k_1+k_2)(x,y) & := k_1(x,y)+k_2(x,y) \\ k_1k_2(x,y) & := k_1(x,y)\cdot k_2(x,y) \end{align*}\]
Scaling: \(f:X\to \mathbb{R}\) and \(k:X\times X\to \mathbb{R}\) positive definite, then
\[k_f(x,y) := f(x)f(y)k(x,y)\]
will also be positive definite.
\[\mathcal{H}_K = \{ \text{limit of linear combinations of functions of the form $k(\cdot,x)$} \}\]
Construction
\[\begin{align*} \mathcal{H}_K^\circ &= \mathbb{R}\{ k(x, \cdot) : x\in X \} \\ &= \bigoplus_{x\in X} \mathbb{R}k(x,\cdot) \\ &= \left\{ f=\sum_{i=1}^n c_i k(x_i,\cdot) : n\in \mathbb{N}, c_i\in \mathbb{R}, x_i\in X \right\} \end{align*}\]
\[f(x) = \langle f, k(x,\cdot) \rangle_K, \quad f\in \mathcal{H}_K^\circ.\]
This is called the reproducing property. The point of this property is that it makes computing an inner product on an infinite dimensional space very easy, just a number.
Reproducing property determines inner product on \(\mathcal{H}_K^\circ\).
\[f = \sum a_i k(x_i, \cdot), \quad g = \sum b_j k(y_j, \cdot)\]
\[\langle f, g\rangle_{\mathcal{H}_K} = \sum_{i,j} a_i b_j k(x_i,y_j)\]
Define the RKHS norm as \(\|f\|_{\mathcal{H}_K} = \sqrt{\langle f, f\rangle_{\mathcal{H}_K}}\).
Lemma. The pairing \(\langle\cdot, \cdot\rangle_{\mathcal{H}_K}\) is indeed an inner product on \(\mathcal{H}_K^\circ\).
\[\sum a_i a_j k(x_i, x_j)\ge 0, \quad \forall a_i\in \mathbb{R}, x_i\in X.\]
\[\langle f, f\rangle_{\mathcal{H}_K} = 0\Rightarrow f=0.\]
Proof. For \(x\in X\),
\[\begin{align*} |f(x)|^2 &= |\langle f, k(x,\cdot)\rangle |^2 \\ &\le \|f\|_{\mathcal{H}_K}^2 \|k(x,\cdot)\|_{\mathcal{H}_K}^2 \\ \end{align*}\]
This means if \(\|f\|_{\mathcal{H}_K}=0\), then \(f(x)=0\) for all \(x\in X\), RKHS norm controls function values.
Take \(\mathcal{H}_K=\overline{\mathcal{H}_K^\circ}\), the completion of \(\mathcal{H}_K^\circ\) with respect to the norm \(\|\cdot\|_{\mathcal{H}_K}\).
Remark. The function space inherent properties from the kernels.
\(k\) is bounded \(\Leftrightarrow \forall f\in \mathcal{H}_K\), \(\|f\|_{\mathcal{H}_K}<\infty\).
\(k\) is bounded continuous \(\Leftrightarrow\) \(\|\cdot\|_{\mathcal{H}_K}\) is bounded continuous.
Let \(\sigma_2 > \sigma_1 > 0\), consider the two spaces \(\mathcal{H}_{K_{\sigma_2}}\subset \mathcal{H}_{K_{\sigma_1}}\).
Remark.
Computing RKHS norm for any given function is difficult because you need to write them as linear combinations of \(k(x,\cdot)\).
This integral is not the RKHS norm, in general
\[\int \int f(x)k(x,y)f(y)\,\mathrm{d}x\,\mathrm{d}y \neq \|f\|_{\mathcal{H}_K}^2.\]
If \(f,g\) are close in \(\mathcal{H}_K\), then they are close point-wise.
\[ |f(x)-g(x)|\le C_x \|f-g\|_{\mathcal{H}_K}.\]
In other words, \(\delta_x : \mathcal{H}_K\to \mathbb{R}, f\mapsto f(x)\) is bounded continuous.
\(\|f\|_{L^2(0,1)}^2 = \int_0^1 f(x)^2\,\mathrm{d}x\), this norm does not have the above property! Because you can move the function value at a local region without changing much of the norm. A practical example is to take \(q_n = x^n\).
In any RKHS, function evaluations are continuous. Any Hilbert space with continuous function evaluations is an RKHS. For example Sobolev spaces where the derivatives are controlled by norm
\[ \|f\|_{W^1} = \int \left|f(x)\right|^2 + \left|f'(x)\right|^2 \,\mathrm{d}x\]
are RKHS.