Author: Eiko

Time: 2025-03-10 10:56:15 - 2025-03-10 10:56:15 (UTC)

Kernel Methods In Machine Learning - Lecture 4

Last time we discussed that given a positive definite kernel k:X×XR

We can have a canonical feature map

φK:XHKxk(x,).

Kernel Mean Embeddings

Given a probability measure we can average the feature map over it

μ:P(Rd)HK

ρk(x,)ρ(dx)

Observation: xXδxP(X), we can get back the feature map / kernel.

This means, since f(x)=k(x,),fHK, we have

EXρ[f(X)]=fdρ=k(x,),fHKdρ(x)=k(x,)dρ(x),fHK=μρ,fHK

We see that it can be used to evaluate the expectation of a function.

Maximal Mean Discrepancy

For ρ,πP(X) we can define

MMD(ρ,π)=μρμπHK.

This is like you pull-backed the metric from the RKHS to the space of probability measures on X.

We can compute

MMD2(ρ,π)=μρμπHK2=μρμπ,μρμπHK=μρ,μρHK+μπ,μπHK2μρ,μπHK=EXρ,Yρ[k(X,Y)]+EXπ,Yπ[k(X,Y)]2EXρ,Yπ[k(X,Y)]

Which we can estimate from samples as

ρ1ni=1nδxi,π1mj=1mδyj

This gives easy estimators for the MMD

MMD2(ρ,π)1n2i,jk(xi,xj)+1m2i,jk(yi,yj)2nmi,jk(xi,yj).

Note that MMD(ρ,π)=0 cannot mean ρ=π, it is not necessarily faithful.

Definition. A kernel k is called characteristic, if the kernel mean embedding is injective.

Theorem. The Gaussian kernel (and most kernels in application like the p-kernels) is characteristic.

We also have an outbedding problem, given f, can you find ρ such that μρ=f?

Other Stuff

ψ(t)eisΦ(t)dt=eisΦ(t0)ψ(t)eis(Φ(t)Φ(t0))dt=eisΦ(t0)ψ(t)eis(A(tt0)2+O(|tt0|3))dt