Kernel Methods In Machine Learning

Author: Eiko

Time: 2025-03-10 10:56:15 - 2025-03-10 10:56:15 (UTC)

Kernel Methods In Machine Learning - Lecture 4

Last time we discussed that given a positive definite kernel $k : X \times X \to R$

We can have a canonical feature map

$φ_{K} : X \to H_{K} x \mapsto k (x, \cdot) .$

Kernel Mean Embeddings

Given a probability measure we can average the feature map over it

$μ : P (R^{d}) \to H_{K}$

$ρ \mapsto \int k (x, \cdot) ρ (d x)$

Observation: $x \in X \Rightarrow δ_{x} \in P (X)$ , we can get back the feature map / kernel.

This means, since $f (x) = ⟨ k (x, \cdot), f ⟩_{H_{K}}$ , we have

$\begin{aligned} E_{X \sim ρ} [f (X)] & = \int f d ρ \\ = \int ⟨ k (x, \cdot), f ⟩_{H_{K}} d ρ (x) \\ = {⟨ \int k (x, \cdot) d ρ (x), f ⟩}_{H_{K}} \\ = ⟨ μ_{ρ}, f ⟩_{H_{K}} \end{aligned}$

We see that it can be used to evaluate the expectation of a function.

Maximal Mean Discrepancy

For $ρ, π \in P (X)$ we can define

$MMD (ρ, π) = ∥ μ_{ρ} - μ_{π} ∥_{H_{K}} .$

This is like you pull-backed the metric from the RKHS to the space of probability measures on $X$ .

We can compute

$\begin{aligned} {MMD}^{2} (ρ, π) & = ∥ μ_{ρ} - μ_{π} ∥_{H_{K}}^{2} \\ = ⟨ μ_{ρ} - μ_{π}, μ_{ρ} - μ_{π} ⟩_{H_{K}} \\ = ⟨ μ_{ρ}, μ_{ρ} ⟩_{H_{K}} + ⟨ μ_{π}, μ_{π} ⟩_{H_{K}} - 2 ⟨ μ_{ρ}, μ_{π} ⟩_{H_{K}} \\ = E_{X \sim ρ, Y \sim ρ} [k (X, Y)] + E_{X \sim π, Y \sim π} [k (X, Y)] - 2 E_{X \sim ρ, Y \sim π} [k (X, Y)] \end{aligned}$

Which we can estimate from samples as

$ρ \approx \frac{1}{n} \sum_{i = 1}^{n} δ_{x_{i}}, π \approx \frac{1}{m} \sum_{j = 1}^{m} δ_{y_{j}}$

This gives easy estimators for the MMD

${MMD}^{2} (ρ, π) \approx \frac{1}{n^{2}} \sum_{i, j} k (x_{i}, x_{j}) + \frac{1}{m^{2}} \sum_{i, j} k (y_{i}, y_{j}) - \frac{2}{n m} \sum_{i, j} k (x_{i}, y_{j}) .$

Note that $MMD (ρ, π) = 0$ cannot mean $ρ = π$ , it is not necessarily faithful.

Definition. A kernel $k$ is called characteristic, if the kernel mean embedding is injective.

Theorem. The Gaussian kernel (and most kernels in application like the $p$ -kernels) is characteristic.

We also have an outbedding problem, given $f$ , can you find $ρ$ such that $μ_{ρ} = f$ ?

Other Stuff

$\begin{aligned} \int ψ (t) e^{i s Φ (t)} d t & = e^{i s Φ (t_{0})} \int ψ (t) e^{i s (Φ (t) - Φ (t_{0}))} d t \\ = e^{i s Φ (t_{0})} \int ψ (t) e^{i s (A (t - t_{0})^{2} + O (| t - t_{0} |^{3}))} d t \end{aligned}$