Howard Policy Iteration (PI) (Optimal Policies for MDPs Optimal Policies for MDPs)

From Algorithm Wiki
Jump to navigation Jump to search

Time Complexity

$O(n^{3})$

Space Complexity

$O(n)$ words

(Only needs to store values (V) and policy (pi), both size O(n))

Description

Approximate?

Exact

Randomized?

No, deterministic

Model of Computation

Word/Real RAM

Year

1960

Reference

http://web.mit.edu/dimitrib/www/dpchapter.pdf