Optimal Policies for MDPs

In an MDP, a policy is a choice of what action to choose at each state An Optimal Policy is a policy where you are always choosing the action that maximizes the “return”/”utility” of the current state. The problem here is to find such an optimal policy from a given MDP.

Parameters

  • S|S|: number of states
  • A|A|: number of actions
  • n: transition matrix size = S×A×S|S|\times|A|\times|S|
  • Let S=A=n1/3|S|=|A|=n^{1/3}

Filters

Computational Model

Randomization

Approximation

Algorithms Table

Displaying 3 of 3 algorithms

See more
Puterman Modified Policy Iteration (MPI)1974O(n3)O(n^3)O(n)O(n)
Howard Policy Iteration (PI)1960O(n3)O(n^3)O(n)O(n)
Bellman Value Iteration (VI)1957O(2n)O(2^n)O(n)O(n)

Reductions Table

Insuffient Data to display table

Other relevant algorithms

Insuffient Data to display table