Kernel Methods In Machine Learning

In finite dimensions, the Gaussian distribution is defined by the density function

$p (x) = \frac{1}{(2 π)^{d / 2} | Σ |^{1 / 2}} \exp (- \frac{1}{2} (x - μ)^{T} Σ^{- 1} (x - μ))$

where $Σ = (Cov (x_{i}, x_{j}))_{i, j = 1}^{n}$ is the covariance matrix.

Definition. Let $X \neq \emptyset$ be a set, $m : X \to R$ is the mean function, $k : X \times X \to R$ is the covariance function.

A random function $f : X \to R$ (can write $f : Ω \times X \to R$ ) is called a Gaussian Process with mean $m$ and covariance $k$ if for any choice of points $D = {x_{1}, \dots, x_{n}} \subset X$ , $f (x_{1}), \dots, f (x_{n})$ are jointly Gaussian distributed with mean $(m (x_{1}), \dots, m (x_{n}))$ and covariance matrix $(k (x_{i}, x_{j}))_{i, j = 1}^{n}$ .

$E f (x_{i}) = m (x_{i}), Cov (f (x_{i}), f (x_{j})) = k (x_{i}, x_{j}) .$
The covariance matrix has to be symmetric positive semi-definite for any $D \subset X$ .
Convercely, for any $m : X \to R$ and $k : X \times X \to R$ positive definite kernel, there is a Gaussian process with the characteristics, which we denote by $GP (m, k)$ .

Take $x, y \in X, x \neq y$ . $Cov (f (x), f (y)) = k (x, y)$ .

Set $m = 0$ for simplicity. If we look at $(f (x), f (y))^{T} \in R^{2}$ Gaussian, with mean zero and covariance matrix

$(\begin{matrix} k (x, x) & k (x, y) \\ k (y, x) & k (y, y) \end{matrix})$

$(x_{i}, y_{i})_{i = 1}^{N}$ are given data, the goal is to find a $f : X \to R$ .

In Bayesian,