Last time we discussed that given a positive definite kernel
We can have a canonical feature map
Given a probability measure we can average the feature map over it
Observation:
This means, since
We see that it can be used to evaluate the expectation of a function.
For
This is like you pull-backed the metric from the RKHS to the space of probability measures on
We can compute
Which we can estimate from samples as
This gives easy estimators for the MMD
Note that
Definition. A kernel
Theorem. The Gaussian kernel (and most kernels in application like the
We also have an outbedding problem, given