Optimal Policies for MDPs: Difference between revisions
Jump to navigation
Jump to search
(Created page with "{{DISPLAYTITLE:Optimal Policies for MDPs (Optimal Policies for MDPs)}} == Description == In an MDP, a policy is a choice of what action to choose at each state An Optimal Policy is a policy where you are always choosing the action that maximizes the “return”/”utility” of the current state. The problem here is to find such an optimal policy from a given MDP. == Parameters == No parameters found. == Table of Algorithms == {| class="wikitable sortable" styl...") |
No edit summary |
||
(3 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
== Parameters == | == Parameters == | ||
$n$: number of states | |||
== Table of Algorithms == | == Table of Algorithms == | ||
Line 24: | Line 24: | ||
|} | |} | ||
== Time Complexity | == Time Complexity Graph == | ||
[[File:Optimal Policies for MDPs - Time.png|1000px]] | [[File:Optimal Policies for MDPs - Time.png|1000px]] | ||
Latest revision as of 09:11, 28 April 2023
Description
In an MDP, a policy is a choice of what action to choose at each state An Optimal Policy is a policy where you are always choosing the action that maximizes the “return”/”utility” of the current state. The problem here is to find such an optimal policy from a given MDP.
Parameters
$n$: number of states
Table of Algorithms
Name | Year | Time | Space | Approximation Factor | Model | Reference |
---|---|---|---|---|---|---|
Bellman Value Iteration (VI) | 1957 | $O({2}^n)$ | $O(n)$ | Exact | Deterministic | Time |
Howard Policy Iteration (PI) | 1960 | $O(n^{3})$ | $O(n)$ | Exact | Deterministic | Time |
Puterman Modified Policy Iteration (MPI) | 1974 | $O(n^{3})$ | $O(n)$ | Exact | Deterministic |