View on GitHub

Bearded-android-docs

Implementation

Download this project as a .zip file Download this project as a tar.gz file

Created Thursday 03 April 2014

Local Receptive Fields

In a fully-connected neural network, each neuron is connected to each input.

Input	Layer

x1------N1
       /
x2----/

When a neuron has a local receptive field, the neuron is connected to a subset of the input layer. In a :NeuralNetworks:ConvolutionalNetwork, the inputs are arranged in a 2D grid. Each neuron is connected to a square in the input grid.

InputGrid --> Layer
		  
X X X * *     N1(X)
X X X * *
X X X * *
* * * * *
* * * * *

In the diagram, the input grid is 5 x 5. Neuron `N1` is connected to the inputs marked `X`.

Shared Weights

In a typical neural network, each neuron has its own set of weights.

Layer
N1: w10 w11 w12 w13
N2: w20 w21 w22 w23
N3: w30 w31 w32 w33

Here weight wij connects neuron `N_i` to input `x_j` (and wi0 is the special case of the bias.) The wij's are not necessarily equal---each neuron has its own set of weights. In a CNN, each neuron of a given layer share a set of common weights.

Layer
N1: w0 w1 w2 w3
N2: w0 w1 w2 w3
N3: w0 w1 w2 w3

Feature Maps

We said above that in a CNN, each neuron is connected to a square in the input grid, but which input squares? In a feature map, we scan the input grid with a neuron. The diagram below shows a 5x5 input grid, a layer with 9 neurons each with a 3x3 local receptive field, producing a 3x3 output grid called the feature map.

InputGrid --> Layer --> OutputGrid
	       	      	   	  
A A A * *     N1(A)   	A * *  	  
A A A * *     N2(B)     * * *	 
A A A * *     N3(C)     * * *
* * * * *     ...  
* * * * *     N9(I)
       	       	      	      	 
InputGrid --> Layer --> OutputGrid
       	       	   	     	  
* B B B *     N1(A)    	A B *  	  
* B B B *     N2(B)     * * *	 
* B B B *     N3(C)     * * *
* * * * *     ...  
* * * * *     N9(I)
    	       	       	 	 
InputGrid --> Layer --> OutputGrid
       	       	       	   	  
* * C C C     N1(A)    	A B C  	  
* * C C C     N2(B)    	* * *	 
* * C C C     N3(C)	* * *	 
* * * * *     ...  	     	 
* * * * *     N9(I)	     	 

...
      			     	 
InputGrid --> Layer --> OutputGrid
       	       	       	     	  
* * * * *     N1(A)    	A B C  	  
* * * * *     N2(B)    	D E F	 
* * I I I     N3(C)	G H I	 
* * I I I     ...  		 
* * I I I     N9(I)

Some observations:

The input squares overlap.
A feature map can consist of a single neuron. (The 9 neurons all share the same set of weights [and activation function], so the neurons are all identical.)

Notes:

While you can implement a feature map with a single neuron, you will be able to take advantage of fast-matrix-operation algorithms by using several neurons each sharing the same set of weights.

Aside:

The operation that produces a feature map is equivalent to a convolution with a small-sized kernel followed by a squashing function. (A convolution is normally used with image processing)

Subsampling Layer

A subsampling layer is just like a feature map with some exceptions:

the input squares are adjacent but do not overlap
the weights (except for the bias) are all constrained to be equal, so a subsampling layer consists of one neuron with only a single weight `w` and a bias `b`

It's called a subsampling layer because when the weight `w` equals the number of units (or pixels when the input grid is an image) in the input square, the input squares get averaged into a single value.

The diagram below shows a 4x4 input grid, a layer with a neuron with a 2x2 local receptive field, producing a 2x2 subsampling layer.

InputGrid --> Layer --> OutputGrid
	       	      	   	  	     
A A * *       N(A)     	A * 
A A * *        	        * *
* * * *
* * * *        	   	  
       	       	       	       	       	       	       	    
InputGrid --> Layer --> OutputGrid			    
       	       	   	     	  			    
* * B B       N(B)     	A B
* * B B	       	        * *
* * * *		  
* * * *		  
    	       	       	 	 
InputGrid --> Layer --> OutputGrid
       	       	       	   	  
* * * *       N(C)     	A B
* * * *	       	        C *
C C * *			 
C C * *			 
			 
InputGrid --> Layer --> OutputGrid
       	       	       	   	  
* * * *       N(D)     	A B
* * * *	       	        C D
* * D D			   
* * D D

Putting It All Together

A CNN usually consists of a sequence of alternating feature maps and subsampling layers followed by a final fully-connected output layer. Simple, right? The complexity is in how those layers are chained together. See :NeuralNetworks:ConvolutionalNetwork:LeNet4 for an example.

Backlinks:

MachineLearning:NeuralNetworks:ConvolutionalNetwork

comments powered by Disqus