Markov Decision Process
A Markov decision process (MDP) is a discrete time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming and reinforcement learning. MDPs were known at least as early as the 1950s; a core body of research on Markov decision processes resulted from Howard‘s 1960 book, Dynamic Programming and Markov Processes. They are used in many disciplines, including robotics, automatic control, economics and manufacturing. The name of MDPs comes from the Russian mathematician Andrey Markov.
– States s, beginning with initial state s0
• Each state s has actions A(s) available from it
– Transition model P(s’ | s, a)
• Markov assumption: the probability of going to s’ from s depends only on depends only on
s and not on any of the previous and not on any of the previous states
– Reward function R (s)
π(s): the action that an agent takes in any given state
– The “solution” to an MDP
Continue at: https://www.cs.unc.edu/~lazebnik/fall10/lec22_mdp.pdf
The text above is owned by the site above referred.
Here is only a small part of the article, for more please follow the link