Author: Eiko
Time: 2025-03-03 10:47:12 - 2025-03-03 10:47:12 (UTC)
Lecturer: Nik at KCL, Computational Statistics
Note taker and remarker: Eiko, I will write down in my own style and adding some personal comments.
Kernel Methods - Lecture 3
The most important equation is the reproducing property of the kernel, recall that it says
Derivative reproducing property: if and is differentiable, then
more generally the higher order derivatives can be computed as
Proof.
Just observe that
For , look at
continuous linear functionals on a Hilbert space are by Riesz representation theorem induced by inner product. So since the evaluation maps , are continuous, they can be written as inner products with some functions in .
Generalized Representer Theorem
then can be written as
where represents in .
Kernel Embeddings
Use PCA (principle component analysis)
Given data , we can compute the covariance matrix
WLOG we can assume , so . This is a symmetric positive semi-definite matrix. By orthogonal diagonalization, we can write for an orthonormal basis
and , here note that just projects onto the one-dimensional subspace spanned by .
We can then project to the space spanned by the orthonormal vectors with larger eigenvalues , which preserves the majority of the variance and inner product.
here can be viewed as an approximation of identity, since .
Lemma.
, picks the maximal variance directions.
, least moving, maximal inner product preservation.
Idea: Embed the points into some high dimensional space, and then do PCA in that space.
Consider a map or equivalently a map by free forgetful adjunction. We wish that this map preserves
Prop. is a map into any Hilbert space , then
defines a positive definite kernel on .
If we have an RKHS on , then
is the canonical feature map associated to .
Let be a set and a positive definite kernel, we get the associated RKHS and canonical feature map .
are now in , and we can compute the inner product
Kernel Trick
The kernel trick says that any algorithm working with points in Euclidean space that only rely on a Euclidean structure , can be generalized to a Kernel setting by replacing all inner products with the kernel function.
Eiko Remark. This seems to be able to utilize the topology of and bring some non-linear structure into the game. The topology seems to be important here, if is given by any numbers and has nothing to do with the topology of (equivalently choosing discrete topology), this is just putting arbitrary inner product on the huge space . The point is the use of topology lowers the dimension we need to consider but not restricting us inside the original space with the Euclidean structure.