Outputs the optimal actions given a world (environment) (which includes the
starting state), value function and generic policy (describing allowed
actions) by picking actions which lead to the next highest-valued state.
True or false indicating whether we are in a saccade or not (but we can only be in a
saccade if we performed an action to get into one so do all these need to exist?).