In recent years the field of neuromorphic low-power systems gained significant momentum, spurring brain-inspired hardware systems which operate on principles that are fundamentally different from standard digital computers and thereby consuming orders of magnitude less power.
However, their wider use is still hindered by the lack of algorithms that can harness the strengths of such architectures. While neuromorphic adaptations of representation learning algorithms are now emerging, the efficient processing of temporal sequences or variable length-inputs remain difficult, partly due to challenges in representing and configuring the dynamics of spiking neural networks.
Recurrent neural networks (RNN) are widely used in machine learning to solve a variety of sequence learning tasks. In this work we present a train-and-constrain methodology that enables the mapping of machine learned (Elman) RNNs on a substrate of spiking neurons, while being compatible with the capabilities of current and near-future neuromorphic systems. This “train-and-constrain” method consists of first training RNNs using backpropagation through time, then discretizing the weights and finally converting them to spiking RNNs by matching the responses of artificial neurons with those of the spiking neurons.
We demonstrate our approach by mapping a natural language processing task (question classification), where we demonstrate the entire mapping process of the recurrent layer of the network on IBM’s Neurosynaptic System “TrueNorth”, a spike-based digital neuromorphic hardware architecture. TrueNorth imposes specific constraints on connectivity, neural and synaptic parameters.
To satisfy these constraints, it was necessary to discretize the synaptic weights to 16 levels, discretize the neural activities to 16 levels, and to limit fan-in to 64 inputs. Surprisingly, we find that short synaptic delays are sufficient to implement the dynamical (temporal) aspect of the RNN in the question classification task. Furthermore we observed that the discretization of the neural activities is beneficial to our train-and-constrain approach. The hardware-constrained model achieved 74% accuracy in question classification while using less than 0.025% of the cores on one TrueNorth chip, resulting in an estimated power consumption of ≈ 17 μW.
MATERIAL & METHODS
The network consists of a projection layer (48 units), a re-current layer (16 units) and a softmax layer for classification (6 units), see figure 1. This combination of different types of layers, here a so called fully-connected (or projection) layer, an Elman or simple RNN and a softmax layer is common in machine learning NNs. The dimension of the projection layer and the recurrent layer were constrained to fit on one core of a TrueNorth chip (see below). Furthermore, due to good performance and to ease the mapping to TrueNorth spiking neurons, the network utilized rectified linear units (ReLU) without biases throughout.
For all four setups we used the question classification test set introduced. The respective accuracy of all four setups is shown in table 3. Training of the original ReLUNN with floating point weights yields a classification accuracy of 85%. The variance of the results was obtained by using different initializations of the parameters of the original network. When reducing the precision of the weights to 4 bit, the accuracy dropped to 72.2%. In the next step we discretized the hidden state to 4 bit.
Our results demonstrate a proof-of-concept recurrent neural network that can be trained offline and afterwards mapped onto the highly power-efficient TrueNorth chip. Furthermore, we show that synaptic delays are sufficient for supporting the temporal dynamics of simple recurrent neural networks. Using a 15 tick delay for “storing” the state of the neurons corresponds to discretizing the state to 4 bit.
However, while the accuracy of the machine learning RNN is comparable to reported results in the original and following studies (84%, 86.2% and 85.6%), there is a performance gap between the machine learning RNN and the TrueNorth network. By having a closer look at the four different models we can understand why this gap exists. The biggest drop in performance is due to the discretization of the synaptic weights, as can be seen by comparing the first and the second model in table 3.
However, this performance decrease due to discretization is expected and has been the topic of other studies. In order to reduce performance losses due to weight discretization, it is possible to choose better discretization methods than simple rounding. For example, it is possible to include discretization in the training of the network (by using rounded weights during the forward pass of backpropagation) or by rounding probabilistically after training.
Source: University Zurich
Authors: Peter U. Diehl | Guido Zarrella | Andrew Cassidy | Bruno U. Pedroni | Emre Neftci |