Optimal Policies for MDPs
In an MDP, a policy is a choice of what action to choose at each state An Optimal Policy is a policy where you are always choosing the action that maximizes the “return”/”utility” of the current state. The problem here is to find such an optimal policy from a given MDP.
Parameters
- : number of states
- : number of actions
- n: transition matrix size =
- Let
Related Problems
Filters
Computational Model
Randomization
Approximation
Algorithms Table
Displaying 3 of 3 algorithms
| See more | ||||
|---|---|---|---|---|
| Puterman Modified Policy Iteration (MPI) | 1974 | |||
| Howard Policy Iteration (PI) | 1960 | |||
| Bellman Value Iteration (VI) | 1957 |
Reductions Table
Insuffient Data to display table
Other relevant algorithms
Insuffient Data to display table