Skip to content
sofian edited this page Dec 13, 2011 · 5 revisions

How many inputs and outputs should the NeuralNetwork I assign to a QLearningAgent instance have? What are these inputs and outputs exactly?

Short answer

  • Number of outputs = 1
  • Number of inputs = observation dimension + action dimension

Long answer

The artificial neural network (ANN) is a function approximator of the Q value. The Q value is an estimate of the expected reward over time. It is estimated by a function: Q(observation, action).

The inputs of the ANN are a concatenation of the observation vector (o_1, ..., o_n) and the action vector (a_1, ..., a_m) (since the actions in Qualia are typically integers, they are remapped to [0,1] here).

Thus the ANN output is:
output = Q(observation, action) = Q(o_1, ..., o_m, a_1, ..., a_n).

So the ANN will have 1 output and (m+n) inputs (where m = observation dimension and n = action dimension) (actually there's one extra "input" which is a bias, always equal to 1, but you don't have to think about it, it's all taken care of by the NeuralNetwork class).

When the ANN is asked to choose an action according to the ε-greedy policy (which is the most common) it will find the greediest action by fixing the observation vector (o_1 ... o_m) and then trying all possible combinations of the action vector (a_1 ... a_n) to find the one that gets the highest Q(observation, action). In other words it's looking for argmax_a Q(observation, a).

For example suppose we have an agent that controls two LEDs that it can switch ON or OFF and observes two photosensor values in [0,1]. Suppose the observation vector is currently (0.5, 0.7) (two photosensor values). Also suppose that the policy has chosen to take the greedy action this time (ie. not to take a random action).

Then the ANN will compute:
Q(0.5, 0.6, 0, 0) = 1.9324
Q(0.5, 0.7, 0, 1) = -2.1123
Q(0.5, 0.7, 1, 0) = 2.1084
Q(0.5, 0.7, 1, 1) = 0.2954

Tt finds that Q(0.5, 0.7, 1, 0) yields the highest/maximum value so it will take action (1, 0) ie. turn the first led ON and the second one OFF.

Clone this wiki locally