Author: Eiko
Tags: machine learning, mathematics, probability
Time: 2024-11-13 17:49:27 - 2024-11-13 17:49:27 (UTC)
Markov Reward Process
The states, reward are given by random variables . With transition probabilities determined by
i.e. the reward depends on both . You can compute the expected state reward denoted by as
The return is a random variable which counts the future rewards, not just a single step reward, it can has many forms, but their basic ideas are the same.
where is some stopping time, another common form is geometrically discounted return
where is the discount factor.
The value function is the expected return starting from state , which is a future update version of
The Bellman equation is a recursive equation for the value function.
In matrix form, , where is the transition matrix of , so you can solve it by . For it is always invertible because the spectral radius of is less than 1.