Markov Decision Process




Markov Decision Process

Markov decision process (MDP) is a discrete time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming and reinforcement learning. MDPs were known at least as early as the 1950s;[1] a core body of research on Markov decision processes resulted from Howard‘s 1960 book, Dynamic Programming and Markov Processes.[2] They are used in many disciplines, including roboticsautomatic controleconomics and manufacturing. The name of MDPs comes from the Russian mathematician Andrey Markov.

• Components:
– States  s, beginning with initial state s0

– Actions

• Each state s has actions A(s) available from it

– Transition model P(s’ | s, a)

• Markov assumption: the probability of going to s’ from s depends only on depends only on
s and not on any of the previous and not on any of the previous states

– Reward function R (s)

• Policy

π(s): the action that an agent takes in any given state

– The “solution” to an MDP


Download PDF here

Continue at:

The text above is owned by the site above referred.

Here is only a small part of the article, for more please follow the link

Also see:


Manostaxx – Industrial Management Consulting

Leave a Reply

Your email address will not be published. Required fields are marked *