Given \(k:X\times X\to \mathbb{R}\) positive definite,
\[\mathcal{H}_K = \overline{\mathrm{span}\{ k(x,\cdot) : x\in X\}}\]
\(\mathcal{H}_K\) is given by the following two properties
\(k(x,\cdot)\in \mathcal{H}_K\)
\(\langle f, k(x,\cdot)\rangle_{\mathcal{H}_K} = f(x)\) for all \(f\in H_K, x\in X\).
An RKHS is a Hilbert space of functions on \(X\), \(\mathcal{H}_K=\{f:X\to \mathbb{R}\}\) such that all evaluation maps \(\delta_x:\mathcal{H}_K\to \mathbb{R}\) at points are (bounded) continuous.
Let
\[H^s(\mathbb{R}^d) = \left\{ f\in L^2(\mathbb{R}^d) : \|f\|_{H^s}<\infty \right\}\]
where
\[\|f\|_{H^s}^2 = \sum_{|\alpha|\leq s} \int_{\mathbb{R}^d} |\partial^\alpha f(x)|^2 dx\]
Remarks
In one dimension, \(H^1\) already has continuous point evaluations, making it an RKHS. But in higher dimensions, controlling only this first order might not be enough anymore.
\(H^s(\mathbb{R}^d)\) is an RKHS iff \(s>\frac{d}{2}\) (ref: Sobolev embedding theorem).
If we define some general version of derivatives (weak derivatives etc), then all RKHS can be viewed as some Sobolev space.
Given some data as a finite subset of \(X\times \mathbb{R}\), we want to find a function \(f:X\to \mathbb{R}\) fitting the data \(f(x_i)\sim y_i\) in the space \(\mathcal{H}_K\).
Problem. Fix \(k:X^2\to\mathbb{R}\), minimize \[\mathbb{E}(f(x_i)-y_i)^2 + \lambda \|f\|_{\mathcal{H}_K}^2, \quad f\in \mathcal{H}_K,\]
where \(\lambda\ge 0\) is the regularization parameter.
If \(f^*\) minimizes the above problem for \(\lambda>0\), then \(f^*\) must be a finite linear combination of the kernel functions \(k(x_i,\cdot)\) for \(x_i\) in the data set.
To prove this theorem we use a orthogonal decomposition \(f = g + h \in \mathcal{U}_D\oplus \mathcal{U}_D^\perp\) where \(\mathcal{U}_D\) is the finite dimensional space of functions in spanned by the data set, if we put in the minimization goal, we get
\[\mathbb{E}(g(x_i)+h(x_i) - y_i)^2 + \lambda \|g\|_{\mathcal{H}_K}^2 + \lambda \|h\|_{\mathcal{H}_K}^2 = \mathbb{E}(g(x_i) - y_i)^2 + \lambda (\|g\|_{\mathcal{H}_K}^2 + \|h\|_{\mathcal{H}_K}^2) \]
We see that minimizing it yields \(h=0\) (unless \(\lambda=0\)).
Now we are trying to minimize
\[S=\mathbb{E}(\langle f, e_i\rangle - y_i)^2 + \lambda \langle f, f\rangle\]
If we differentiate \(S\) with respect to \(f = \sum f_i e_i\),
\[\begin{align*} DS &= \mathbb{E}2(\langle f, e_i\rangle - y_i)\langle Df, e_i\rangle + 2\lambda \langle Df, f\rangle\\ &= 2\langle Df, \mathbb{E}(\langle f,e_i\rangle - y_i)e_i+\lambda f\rangle\\ &= 2\langle Df, \mathbb{E}(\langle f,e_i\rangle + \lambda f_i- y_i)e_i\rangle\\ \end{align*}\]
and set it to zero for all \(Df\), we get
\[ \langle f, e_i\rangle + \lambda f_i - y_i = 0 \]
In terms of matrices this is
\[ (K+\lambda I)\cdot \underline{f} = \underline{y} \]
where \(K_{ij} = k(x_i,x_j)\).