Analyzing and Accelerating the Bottlenecks of Training Deep SNNs With Backpropagation.

Abstract:

:Spiking neural networks (SNNs) with the event-driven manner of transmitting spikes consume ultra-low power on neuromorphic chips. However, training deep SNNs is still challenging compared to convolutional neural networks (CNNs). The SNN training algorithms have not achieved the same performance as CNNs. In this letter, we aim to understand the intrinsic limitations of SNN training to design better algorithms. First, the pros and cons of typical SNN training algorithms are analyzed. Then it is found that the spatiotemporal backpropagation algorithm (STBP) has potential in training deep SNNs due to its simplicity and fast convergence. Later, the main bottlenecks of the STBP algorithm are analyzed, and three conditions for training deep SNNs with the STBP algorithm are derived. By analyzing the connection between CNNs and SNNs, we propose a weight initialization algorithm to satisfy the three conditions. Moreover, we propose an error minimization method and a modified loss function to further improve the training performance. Experimental results show that the proposed method achieves 91.53% accuracy on the CIFAR10 data set with 1% accuracy increase over the STBP algorithm and decreases the training epochs on the MNIST data set to 15 epochs (over 13 times speed-up compared to the STBP algorithm). The proposed method also decreases classification latency by over 25 times compared to the CNN-SNN conversion algorithms. In addition, the proposed method works robustly for very deep SNNs, while the STBP algorithm fails in a 19-layer SNN.

journal_name

Neural Comput

journal_title

Neural computation

authors

Chen R,Li L

doi

10.1162/neco_a_01319

subject

Has Abstract

pub_date

2020-12-01 00:00:00

pages

2557-2600

issue

12

eissn

0899-7667

issn

1530-888X

journal_volume

32

pub_type

杂志文章
  • Regulation of ambient GABA levels by neuron-glia signaling for reliable perception of multisensory events.

    abstract::Activities of sensory-specific cortices are known to be suppressed when presented with a different sensory modality stimulus. This is referred to as cross-modal inhibition, for which the conventional synaptic mechanism is unlikely to work. Interestingly, the cross-modal inhibition could be eliminated when presented wi...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00356

    authors: Hoshino O

    更新日期:2012-11-01 00:00:00

  • Sufficient dimension reduction via squared-loss mutual information estimation.

    abstract::The goal of sufficient dimension reduction in supervised learning is to find the low-dimensional subspace of input features that contains all of the information about the output values that the input features possess. In this letter, we propose a novel sufficient dimension-reduction method using a squared-loss variant...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00407

    authors: Suzuki T,Sugiyama M

    更新日期:2013-03-01 00:00:00

  • Resonator Networks, 2: Factorization Performance and Capacity Compared to Optimization-Based Methods.

    abstract::We develop theoretical foundations of resonator networks, a new type of recurrent neural network introduced in Frady, Kent, Olshausen, and Sommer (2020), a companion article in this issue, to solve a high-dimensional vector factorization problem arising in Vector Symbolic Architectures. Given a composite vector formed...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01329

    authors: Kent SJ,Frady EP,Sommer FT,Olshausen BA

    更新日期:2020-12-01 00:00:00

  • Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes.

    abstract::We extend the neural Turing machine (NTM) model into a dynamic neural Turing machine (D-NTM) by introducing trainable address vectors. This addressing scheme maintains for each memory cell two separate vectors, content and address vectors. This allows the D-NTM to learn a wide variety of location-based addressing stra...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01060

    authors: Gulcehre C,Chandar S,Cho K,Bengio Y

    更新日期:2018-04-01 00:00:00

  • Positive Neural Networks in Discrete Time Implement Monotone-Regular Behaviors.

    abstract::We study the expressive power of positive neural networks. The model uses positive connection weights and multiple input neurons. Different behaviors can be expressed by varying the connection weights. We show that in discrete time and in the absence of noise, the class of positive neural networks captures the so-call...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00789

    authors: Ameloot TJ,Van den Bussche J

    更新日期:2015-12-01 00:00:00

  • Mirror symmetric topographic maps can arise from activity-dependent synaptic changes.

    abstract::Multiple adjacent, roughly mirror-image topographic maps are commonly observed in the sensory neocortex of many species. The cortical regions occupied by these maps are generally believed to be determined initially by genetically controlled chemical markers during development, with thalamocortical afferent activity su...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766053491904

    authors: Schulz R,Reggia JA

    更新日期:2005-05-01 00:00:00

  • Piecewise-linear neural networks and their relationship to rule extraction from data.

    abstract::This article addresses the topic of extracting logical rules from data by means of artificial neural networks. The approach based on piecewise linear neural networks is revisited, which has already been used for the extraction of Boolean rules in the past, and it is shown that this approach can be important also for t...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2006.18.11.2813

    authors: Holena M

    更新日期:2006-11-01 00:00:00

  • Methods for combining experts' probability assessments.

    abstract::This article reviews statistical techniques for combining multiple probability distributions. The framework is that of a decision maker who consults several experts regarding some events. The experts express their opinions in the form of probability distributions. The decision maker must aggregate the experts' distrib...

    journal_title:Neural computation

    pub_type: 杂志文章,评审

    doi:10.1162/neco.1995.7.5.867

    authors: Jacobs RA

    更新日期:1995-09-01 00:00:00

  • Computing with self-excitatory cliques: A model and an application to hyperacuity-scale computation in visual cortex.

    abstract::We present a model of visual computation based on tightly inter-connected cliques of pyramidal cells. It leads to a formal theory of cell assemblies, a specific relationship between correlated firing patterns and abstract functionality, and a direct calculation relating estimates of cortical cell counts to orientation...

    journal_title:Neural computation

    pub_type: 杂志文章,评审

    doi:10.1162/089976699300016782

    authors: Miller DA,Zucker SW

    更新日期:1999-01-01 00:00:00

  • Transmission of population-coded information.

    abstract::As neural activity is transmitted through the nervous system, neuronal noise degrades the encoded information and limits performance. It is therefore important to know how information loss can be prevented. We study this question in the context of neural population codes. Using Fisher information, we show how informat...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00227

    authors: Renart A,van Rossum MC

    更新日期:2012-02-01 00:00:00

  • Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks.

    abstract::Previous work on analog VLSI implementation of multilayer perceptrons with on-chip learning has mainly targeted the implementation of algorithms like backpropagation. Although backpropagation is efficient, its implementation in analog VLSI requires excessive computational hardware. In this paper we show that, for anal...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1991.3.4.546

    authors: Jabri M,Flower B

    更新日期:1991-01-01 00:00:00

  • Kernels for longitudinal data with variable sequence length and sampling intervals.

    abstract::We develop several kernel methods for classification of longitudinal data and apply them to detect cognitive decline in the elderly. We first develop mixed-effects models, a type of hierarchical empirical Bayes generative models, for the time series. After demonstrating their utility in likelihood ratio classifiers (a...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00164

    authors: Lu Z,Leen TK,Kaye J

    更新日期:2011-09-01 00:00:00

  • Some sampling properties of common phase estimators.

    abstract::The instantaneous phase of neural rhythms is important to many neuroscience-related studies. In this letter, we show that the statistical sampling properties of three instantaneous phase estimators commonly employed to analyze neuroscience data share common features, allowing an analytical investigation into their beh...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00422

    authors: Lepage KQ,Kramer MA,Eden UT

    更新日期:2013-04-01 00:00:00

  • Online adaptive decision trees.

    abstract::Decision trees and neural networks are widely used tools for pattern classification. Decision trees provide highly localized representation, whereas neural networks provide a distributed but compact representation of the decision space. Decision trees cannot be induced in the online mode, and they are not adaptive to ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766041336396

    authors: Basak J

    更新日期:2004-09-01 00:00:00

  • Rapid processing and unsupervised learning in a model of the cortical macrocolumn.

    abstract::We study a model of the cortical macrocolumn consisting of a collection of inhibitorily coupled minicolumns. The proposed system overcomes several severe deficits of systems based on single neurons as cerebral functional units, notably limited robustness to damage and unrealistically large computation time. Motivated ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976604772744893

    authors: Lücke J,von der Malsburg C

    更新日期:2004-03-01 00:00:00

  • Evaluating auditory performance limits: II. One-parameter discrimination with random-level variation.

    abstract::Previous studies have combined analytical models of stochastic neural responses with signal detection theory (SDT) to predict psychophysical performance limits; however, these studies have typically been limited to simple models and simple psychophysical tasks. A companion article in this issue ("Evaluating Auditory P...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976601750541813

    authors: Heinz MG,Colburn HS,Carney LH

    更新日期:2001-10-01 00:00:00

  • Spiking neural P systems with a generalized use of rules.

    abstract::Spiking neural P systems (SN P systems) are a class of distributed parallel computing devices inspired by spiking neurons, where the spiking rules are usually used in a sequential way (an applicable rule is applied one time at a step) or an exhaustive way (an applicable rule is applied as many times as possible at a s...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00665

    authors: Zhang X,Wang B,Pan L

    更新日期:2014-12-01 00:00:00

  • A Reservoir Computing Model of Reward-Modulated Motor Learning and Automaticity.

    abstract::Reservoir computing is a biologically inspired class of learning algorithms in which the intrinsic dynamics of a recurrent neural network are mined to produce target time series. Most existing reservoir computing algorithms rely on fully supervised learning rules, which require access to an exact copy of the target re...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01198

    authors: Pyle R,Rosenbaum R

    更新日期:2019-07-01 00:00:00

  • On the emergence of rules in neural networks.

    abstract::A simple associationist neural network learns to factor abstract rules (i.e., grammars) from sequences of arbitrary input symbols by inventing abstract representations that accommodate unseen symbol sets as well as unseen but similar grammars. The neural network is shown to have the ability to transfer grammatical kno...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602320264079

    authors: Hanson SJ,Negishi M

    更新日期:2002-09-01 00:00:00

  • A Mathematical Analysis of Memory Lifetime in a Simple Network Model of Memory.

    abstract::We study the learning of an external signal by a neural network and the time to forget it when this network is submitted to noise. The presentation of an external stimulus to the recurrent network of binary neurons may change the state of the synapses. Multiple presentations of a unique signal lead to its learning. Th...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01286

    authors: Helson P

    更新日期:2020-07-01 00:00:00

  • The Discriminative Kalman Filter for Bayesian Filtering with Nonlinear and Nongaussian Observation Models.

    abstract::The Kalman filter provides a simple and efficient algorithm to compute the posterior distribution for state-space models where both the latent state and measurement models are linear and gaussian. Extensions to the Kalman filter, including the extended and unscented Kalman filters, incorporate linearizations for model...

    journal_title:Neural computation

    pub_type: 信件

    doi:10.1162/neco_a_01275

    authors: Burkhart MC,Brandman DM,Franco B,Hochberg LR,Harrison MT

    更新日期:2020-05-01 00:00:00

  • Abstract stimulus-specific adaptation models.

    abstract::Many neurons that initially respond to a stimulus stop responding if the stimulus is presented repeatedly but recover their response if a different stimulus is presented. This phenomenon is referred to as stimulus-specific adaptation (SSA). SSA has been investigated extensively using oddball experiments, which measure...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00077

    authors: Mill R,Coath M,Wennekers T,Denham SL

    更新日期:2011-02-01 00:00:00

  • On the slow convergence of EM and VBEM in low-noise linear models.

    abstract::We analyze convergence of the expectation maximization (EM) and variational Bayes EM (VBEM) schemes for parameter estimation in noisy linear models. The analysis shows that both schemes are inefficient in the low-noise limit. The linear model with additive noise includes as special cases independent component analysis...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766054322991

    authors: Petersen KB,Winther O,Hansen LK

    更新日期:2005-09-01 00:00:00

  • Alignment of coexisting cortical maps in a motor control model.

    abstract::How do multiple feature maps that coexist in the same region of cerebral cortex align with each other? We hypothesize that such alignment is governed by temporal correlations: features in one map that are temporally correlated with those in another come to occupy the same spatial locations in cortex over time. To exam...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1996.8.4.731

    authors: Chen Y,Reggia JA

    更新日期:1996-05-15 00:00:00

  • Approximation by fully complex multilayer perceptrons.

    abstract::We investigate the approximation ability of a multilayer perceptron (MLP) network when it is extended to the complex domain. The main challenge for processing complex data with neural networks has been the lack of bounded and analytic complex nonlinear activation functions in the complex domain, as stated by Liouville...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976603321891846

    authors: Kim T,Adali T

    更新日期:2003-07-01 00:00:00

  • Scalable Semisupervised Functional Neurocartography Reveals Canonical Neurons in Behavioral Networks.

    abstract::Large-scale data collection efforts to map the brain are underway at multiple spatial and temporal scales, but all face fundamental problems posed by high-dimensional data and intersubject variability. Even seemingly simple problems, such as identifying a neuron/brain region across animals/subjects, become exponential...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00852

    authors: Frady EP,Kapoor A,Horvitz E,Kristan WB Jr

    更新日期:2016-08-01 00:00:00

  • Oscillating Networks: Control of Burst Duration by Electrically Coupled Neurons.

    abstract::The pyloric network of the stomatogastric ganglion in crustacea is a central pattern generator that can produce the same basic rhythm over a wide frequency range. Three electrically coupled neurons, the anterior burster (AB) neuron and two pyloric dilator (PD) neurons, act as a pacemaker unit for the pyloric network. ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1991.3.4.487

    authors: Abbott LF,Marder E,Hooper SL

    更新日期:1991-01-01 00:00:00

  • The time-organized map algorithm: extending the self-organizing map to spatiotemporal signals.

    abstract::The new time-organized map (TOM) is presented for a better understanding of the self-organization and geometric structure of cortical signal representations. The algorithm extends the common self-organizing map (SOM) from the processing of purely spatial signals to the processing of spatiotemporal signals. The main ad...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976603765202695

    authors: Wiemer JC

    更新日期:2003-05-01 00:00:00

  • Generalization and selection of examples in feedforward neural networks.

    abstract::In this work, we study how the selection of examples affects the learning procedure in a boolean neural network and its relationship with the complexity of the function under study and its architecture. We analyze the generalization capacity for different target functions with particular architectures through an analy...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976600300014999

    authors: Franco L,Cannas SA

    更新日期:2000-10-01 00:00:00

  • Neural integrator: a sandpile model.

    abstract::We investigated a model for the neural integrator based on hysteretic units connected by positive feedback. Hysteresis is assumed to emerge from the intrinsic properties of the cells. We consider the recurrent networks containing either bistable or multistable neurons. We apply our analysis to the oculomotor velocity-...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2008.12-06-416

    authors: Nikitchenko M,Koulakov A

    更新日期:2008-10-01 00:00:00