POMDPs
At (discrete) time step , the environment is assumed to be in some state . The agent then performs an action (control) , whereupon the environment (stochastically) changes to a new state . The agent doesn’t see the environment state, but instead receives an observation , which is some (stochastic) function of . (If , the POMDP reduces to a fully observed MDP.) In addition, the agent receives a special observation signal called the reward, . The POMDP is characterized by the state transition function , the observation function , and the reward function . The goal of the agent is to learn a policy which maps the observation history (trajectory) into an action to maximize ’s quality or value.
Related Problems
Filters
Computational Model
Randomization
Approximation
Algorithms Table
Insuffient Data to display table
Reductions Table
Insuffient Data to display table
Other relevant algorithms
Displaying 13 of 13 other relevant algorithms