Probabilistic Category

Author: Eiko

Time: 2025-02-20 13:02:46 - 2025-12-18 23:53:09 (UTC)

References:

Markov Category on nLab
A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics

Markov Category

Markov category gives a way to write probability theory that is ‘decoupled’ from a particular implementation (i.e. discrete distributions, continuous ones, or general measure theoretic ones). It is an interesting abstraction where you use morphisms to talk about random variables.

Formal Definition

A markov category is a semi-cartesian symmetric monoidal category \((\mathcal{C}, \otimes, 1)\) that supplies cocommutative comonoids.

The means:

monoidal: A monoidal structure is a ‘tensor product’ \(\otimes:\mathcal{C}\times \mathcal{C}\to \mathcal{C}\) and a unit \(1\in \mathcal{C}\) such that associativity and unit laws are satisfied. In principle this would need several gluing morphisms to describe the commutative diagrams involved.
semi-cartesian: The unit \(1\in \mathcal{C}\) of the monoidal category is a terminal object. This means \(\otimes\) acts ‘semi’-like products: we have both projections.

The category will have all counits \(\varepsilon_X: X\to 1\) and all comultiplications \(\Delta_X:X\to X\otimes X\). Typically the counit is called the ‘discard’ map (and is denoted by exclamation mark, or bang \(\!\)) and the comultiplication is called the copy map.
symmetric: The monoidal structure is commutative.
supplying: A symmetric monoidal category \(\mathcal{C}\) supplies a property \(P\) if every object has a \(P\) structure that is compatible with tensor product.
comonoids: A comonoid \((X,\Delta_X, !_X)\) in a monoidal category is equipped with a
- comultiplication: \(\Delta_X: X\to X\otimes X\) (alias: copy).
  
  In markov categories, this copies the result of a random variable.
- counit: \(!_X: X\to 1\) (alias: discard)
They satisfy coassociativity and counitality axioms.
cocommutative: The comultiplication is commutative, i.e. \(\sigma_{X,X}\circ \Delta_X = \Delta_X\) where \(\sigma_{X,X}: X\otimes X\to X\otimes X\) is the symmetry isomorphism.

First Concepts

A Distribution Is A Morphism From Unit

In this language, a distribution on a space \(X\) is thought as a ‘random-element’ in \(X\), corresponding to a morphism \(\psi: I\to X\). This description hides the ‘implementation’ of the distribution (e.g. discrete, continuous, etc).

To discuss randomness here, you don’t need to even talk about real numbers!

Random Variables Are Morphisms

If \((X,\psi)\) is a space with distribution, in classical probability theory there is a push-forward measure \(f_* \psi\) on \(Y\) for a measurable function \(f:X\to Y\). In markov categories, this is simply the composition of morphisms: \(f\circ \psi: I\to Y\).

Deterministic Maps Are Comonoid Homomorphisms

A morphism \(f:X\to Y\) is called deterministic if it preserves the comonoid structure, i.e. \[(f \otimes f) \circ \Delta_X = \Delta_Y \circ f\] literally this means, \(f\) has no randomness, sampling twice is the same as sampling once and copying the result.

Remark: The subcategory \(\mathcal{C}_{\det}\) of deterministic morphisms is cartesian monoidal. Cartesian is a synonym for deterministic in the context of Markov categories.

Joint Distributions Are Monoidal Products

For two spaces \(X,Y\), a joint distribution of \((X,Y)\) is thought as a morphism \(\psi: I\to X\otimes Y\). For two distributions \(\psi_X: I\to X\) and \(\psi_Y: I\to Y\), their (independent) product distribution is given by the monoidal product \(\psi_X \otimes \psi_Y: I\to X\otimes Y\).

Marginalization Is Composition With Discard

If we have a joint distribution \(\psi: I\to X\otimes Y\), the marginal distribution on \(X\) is given by composing with the discard map on \(Y\): \[\psi_X = (\mathrm{id}_X \otimes !_Y) \circ \psi: I\to X\]

Examples

`FinStoch`

The category of finite sets and stochastic matrices is a markov category. Here, the monoidal product is given by cartesian product of sets, the unit is the singleton set, the copy map duplicates the value, and the discard map forgets the value.