Introduction & concepts

This library implements standard and recurrent (i.e. for structures) neural networks in term of graphs. The approach is data-flow oriented: inputs are fed into some nodes of the graph, output are asked to other nodes, and the nodes (and links) themselves perform the necessary computations, trying to minimize the chance of doing unnecessary work (for example, calculating twice the same result).

STL facilities such as sets, maps and vectors are used extensively to obtain a code easier to read and maintain. The implementation is not space-efficient, and in some places not even time-efficient: the design was aimed at consistency and ease of use.

The fundamental element is the graph, consisting in a set of nodes and links, with elementary memory-management (i.e. all elements are destroyed upon destruction of the graph itself).
A neural network is just a particular kind of graph, where we distinguish input and output nodes. These are organized in a vector, to easily set and read their values in the proper order. As said, all the calculations are performed by neurons and weights, both forward, to obtain network outputs, and backward, to obtain gradients for use in gradient-based algorithms. The gradients are computed using Back Propagation: before obtaining gradient information it is necessary to set the "delta" values of output neurons. What we want to calculate is the derivative, with respect to the weight values, of some function of the network's outputs: the "delta"s are the values of the derivative of this function with respect to the network's outputs.

The library also supports recurrent neural networks to approximate functions whose domain is a set of labeled graphs. Each recurrent network has both local and recurrent inputs and outputs, and the special weights used can calculate gradients of functions of either (or both) kind of outputs: one just need to assign the "delta" values to the proper output neurons. The algorithm used is Back Propagation Through Structure.

Both graphs and neural networks can be written to and read from streams. In the case of graphs, the topology is preserved; in the case of neural networks, also weights' values and neurons' activation functions are preserved. For this latter feature to work, it is necessary to give each activation function a name: this name will get written to the stream, and will be used to retrieve the appropriate function when reading it back.

Memory considerations

Here follows a table with memory occupation for objects of each class defined by the library. Note that for container classes (graphs and networks) the given size refers to container only: the size of the components (for example, neurons and weights) must be added separately.

Some assumptions are made:

that all pointers have the same size
that the size of an object is the sum of the sizes of its members
that when deriving a class, the size of objects of the new class is exactly equal to the size of objects of the parent class plus the size of added members

The second assumption is just to simplify notation: the overhead of object allocation should be added to all sizes given below. This overhead, using gcc 2.95.2 under Linux on an i386, is 4 bytes for classes with virtual methods, and 0 bytes for classes without virtual methods. Moreover, there can be aligment issues: gcc seems to round sizes up to 4-byte multiples.

Notation for the following table:

[type]: the value of sizeof(type). [*] is the size of a pointer.
SET(n,s): the memory occupation of the STL's std::set<> containing n objects of size s
VECTOR(n,s): the memory occupation of the STL's std::vector<> containing n objects of size s
MAP(n,s1,s2): the memory occupation of the STL's std::map<> containing n objects of size s2 each with a key of size s1

Object class	Memory occupation	Notes
`Node`	SET(indegree,[])+SET(outdegree,[])+[bool]
`Link`	2[]+[bool]
`Graph`	SET(nodenum,[])+SET(linknum,[])	nodenum=number of nodes linknum=number of links
`NodeInserter`	[*]
`LinkInserter`	[*]
`Weight`	[Link]+[double]
`Neuron`	[Node]+2[]+3*[double]+[bool]
`InputTerminal`	[Neuron]+[double]
`NeuralNet`	[Graph]+VECTOR(outs,[])+VECTOR(ins,[])	outs=number of output neurons ins=number of input neurons
`NeuronInserter`	2[]
`ITInserter`	2[]
`LabelNode`	[Node]+VECTOR(llen,[double])	llen=length of the label
`NumLink`	[Link]+[double]
`LabelGraph`	[Graph]
`RWeight`	[Weight]+SET(0,[*])	this size refers to weights in a folded network
`RWeight`	[Weight]+SET(graphsize,[*])	graphsize=number of nodes in the input graph this size refers to weights in an unfolded networks
`CWeight`	[Weight]+[*]
`RecNeuralNet`	[Graph]+VECTOR(ro,[])+VECTOR(ri,[])+VECTOR(lo,[])+ +VECTOR(li,[])+MAP(0,[],[])	ro=number of recurrent output neurons ri=number of recurrent input neurons lo=number of local output neurons li=number of local input neurons this size refers to the folded network
`RecNeuralNet`	(graphsize+1)([Graph]+VECTOR(ro,[])+VECTOR(ri,[])+ +VECTOR(lo,[])+VECTOR(li,[]))+ +linknum(VECTOR(ro,[])+ro[Weight])+MAP(graphsize,[],[])	graphsize=number of nodes in the input graph linknum=number of links in the input graph this size refers to the unfolded network