Created Thursday 12 December 2013
Step 0. Pick values for the learning and momentum parameters.
Step 1. Initialize the weights.
Step 2. Go through each training example:
- 2a. Perform the forward propagation. Starting with the input layer, store each neuron's induced local field and impulse function value.
- 2b. Perform the backpropagation. Starting with the output layer, calculate the gradients of each neuron. Make sure to store the gradients.
- 2c. Update the cumulative learning terms for each weight of each neuron starting with the output layer. Do NOT apply the backprop formula yet.
- 2d. Discard the stored induced local fields, impulse functions, and gradients.
Step 3. Apply the backprop formula to each weight using its cumulative learning term and momentum term. Make sure to store the values of the previous weights.
Step 4. Construct the error function from the training examples.
Step 5. Repeat steps 2--4 until the error is smaller than a pre-selected value. If the error never converges, pick new backpropagation parameters and restart from Step 0.
Notes:
- Its called the backpropagation algorithm because you start from the output layer (Steps 2b and c) and work backwards to the input layer.
- Steps 2--4 are one iteration of the backpropagation algorithm.
- A common mistake is adjust the weights before cycling through all the examples. You need to calculate a weight's cumulative learning term (from all the examples) before updating it.
- Stuck? Read the +Troubleshooting page
Required Memory and Data Structures
You'll need data structure(s) to store:
- induced local fields and the values of the impulse functions of its neurons for each layer. Since these are just used to find the gradients and the learning terms, you do not need to store this data for each example. The data structure(s) can be shared by each example.
- gradients of the neurons of each layer. Ditto. Since these are just used to find the learning terms, you do not need to store this data for each example. The data structure(s) can be shared by each example.
- learning terms for the weights of the neurons in each layer. These represent the cumulative learning terms of all examples, so all you need is one set of data structure(s) per iteration.
- previous weight values (these are used by the momentum term). You do not need to store the complete history of the weights, just the previous values.