Created Monday 14 July 2014
- `L` = the output layer.
- `(bb x, bb t)` = training example
- `N_j^((l))` = neuron `j` in layer `l`
- `t_j` = the expected output of `N_j^((L))` when the network input is `bb x`
- `o_j^((l))` = the actual output of `N_j^((l))` = impulse function value of `N_j^((l))` when the network input is `bb x`
- `v_j^((l))` = the induced local field of `N_j^((l))` when the network input is `bb x`
- `gamma_j^((l))` = the derivative of `N_j^((l))`s activation function `phi`.
- In general, let superscript `l` denote layer `l`.
- Let weight `w_(kj)^((l+1))` connect `N_k^((l+1))` to `N_j^((l))`
Layer l Layer (l+1) neuron 1 neuron 1 ... ... ... ... neuron j ----w_kj---> neuron k ... ...
- Let `w_(k0)^((l)` be the bias
Note 1: Neuron `N_j^((l))` isn't necessarily fed input `bb x`. The network is fed input `bb x` and that in turn completely determines the inputs to neuron `N_j^((l))`.
Note 2: `t_j` is the expected network output, that is, the output of the neuron j in the output layer.
Note 3: Recall that there are two outputs associated with a neuron at input `bb y` --- the induced local field `bb w * bb y` and the value of its impulse function `phi(bb w * bb y)`.
Note 4: `gamma_k` is a function. `v_j` and `o_j` are real numbers.
Note 5: since `w_(k0)^((l))` is the bias, it's not really connected to a neuron in the previous layer (but see below).
All Layers Have an Input Layer
Given input `bb x`, define `bb y^((l))` as the input that that's fed to the next layer `(l+1)`:
- For `l = 0` (input layer), let `y_j^((0)) = x_j` (the "output" of the input layer is just the input).
- For `l > 0`, let `y_j^((l)) = o_j^((l))` (the output of neuron `N_j` in layer `l`).
- For all `l`, let `y_0^((l)) = +1` (map `y_0` to the bias).
Note: Since `w_(j0)` is always the bias, we can write the output of neuron `N_j^((l+1))` in terms of `y^((l))`:
`\ o_j^((l+1)) = phi(bb y^((l)) * bb w) = phi(sum_k y_k^((l)) w_(jk))`
where the sum `sum_k` is over the neurons in the previous layer.
More Than One Training Example
Up to this point, the notation doesn't take into account multiple training examples. To associate a term with the ith training example `{bb x[i], bb t[i]}`, use square brackets. For example,
- `o_j^((l))[i]`= the output (value of the impulse function) of neuron `N_j^((l))` when the network is fed input `bb x[i]`.
- `delta_j^((l))[i]` = the gradient of neuron `N_j^((l))` when the network is fed input `bb x[i]`.
- `y_j^((l))[i]` = the output of neuron `N_j^((l))` when the network is fed input `bb x[i]`
No backlinks to this page. comments powered by Disqus