Why Does Large Batch Training Result in Poor Generalization? A Comprehensive Explanation and a Better Strategy from the Viewpoint of Stochastic Optimization.

Abstract:

:We present a comprehensive framework of search methods, such as simulated annealing and batch training, for solving nonconvex optimization problems. These methods search a wider range by gradually decreasing the randomness added to the standard gradient descent method. The formulation that we define on the basis of this framework can be directly applied to neural network training. This produces an effective approach that gradually increases batch size during training. We also explain why large batch training degrades generalization performance, which previous studies have not clarified.

journal_name

Neural Comput

journal_title

Neural computation

authors

Takase T,Oyama S,Kurihara M

doi

10.1162/neco_a_01089

subject

Has Abstract

pub_date

2018-07-01 00:00:00

pages

2005-2023

issue

7

eissn

0899-7667

issn

1530-888X

journal_volume

30

pub_type

杂志文章
  • Neutral stability, rate propagation, and critical branching in feedforward networks.

    abstract::Recent experimental and computational evidence suggests that several dynamical properties may characterize the operating point of functioning neural networks: critical branching, neutral stability, and production of a wide range of firing patterns. We seek the simplest setting in which these properties emerge, clarify...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00461

    authors: Cayco-Gajic NA,Shea-Brown E

    更新日期:2013-07-01 00:00:00

  • Active Learning for Enumerating Local Minima Based on Gaussian Process Derivatives.

    abstract::We study active learning (AL) based on gaussian processes (GPs) for efficiently enumerating all of the local minimum solutions of a black-box function. This problem is challenging because local solutions are characterized by their zero gradient and positive-definite Hessian properties, but those derivatives cannot be ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01307

    authors: Inatsu Y,Sugita D,Toyoura K,Takeuchi I

    更新日期:2020-10-01 00:00:00

  • Spikernels: predicting arm movements by embedding population spike rate patterns in inner-product spaces.

    abstract::Inner-product operators, often referred to as kernels in statistical learning, define a mapping from some input space into a feature space. The focus of this letter is the construction of biologically motivated kernels for cortical activities. The kernels we derive, termed Spikernels, map spike count sequences into an...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766053019944

    authors: Shpigelman L,Singer Y,Paz R,Vaadia E

    更新日期:2005-03-01 00:00:00

  • Neural Circuits Trained with Standard Reinforcement Learning Can Accumulate Probabilistic Information during Decision Making.

    abstract::Much experimental evidence suggests that during decision making, neural circuits accumulate evidence supporting alternative options. A computational model well describing this accumulation for choices between two options assumes that the brain integrates the log ratios of the likelihoods of the sensory inputs given th...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00917

    authors: Kurzawa N,Summerfield C,Bogacz R

    更新日期:2017-02-01 00:00:00

  • On the role of biophysical properties of cortical neurons in binding and segmentation of visual scenes.

    abstract::Neuroscience is progressing vigorously, and knowledge at different levels of description is rapidly accumulating. To establish relationships between results found at these different levels is one of the central challenges. In this simulation study, we demonstrate how microscopic cellular properties, taking the example...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976699300016377

    authors: Verschure PF,König P

    更新日期:1999-07-01 00:00:00

  • On the problem in model selection of neural network regression in overrealizable scenario.

    abstract::In considering a statistical model selection of neural networks and radial basis functions under an overrealizable case, the problem of unidentifiability emerges. Because the model selection criterion is an unbiased estimator of the generalization error based on the training error, this article analyzes the expected t...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602760128090

    authors: Hagiwara K

    更新日期:2002-08-01 00:00:00

  • Direct estimation of inhomogeneous Markov interval models of spike trains.

    abstract::A necessary ingredient for a quantitative theory of neural coding is appropriate "spike kinematics": a precise description of spike trains. While summarizing experiments by complete spike time collections is clearly inefficient and probably unnecessary, the most common probabilistic model used in neurophysiology, the ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.07-08-828

    authors: Wójcik DK,Mochol G,Jakuczun W,Wypych M,Waleszczyk WJ

    更新日期:2009-08-01 00:00:00

  • Capturing the Dynamical Repertoire of Single Neurons with Generalized Linear Models.

    abstract::A key problem in computational neuroscience is to find simple, tractable models that are nevertheless flexible enough to capture the response properties of real neurons. Here we examine the capabilities of recurrent point process models known as Poisson generalized linear models (GLMs). These models are defined by a s...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01021

    authors: Weber AI,Pillow JW

    更新日期:2017-12-01 00:00:00

  • Inhibition and Excitation Shape Activity Selection: Effect of Oscillations in a Decision-Making Circuit.

    abstract::Decision making is a complex task, and its underlying mechanisms that regulate behavior, such as the implementation of the coupling between physiological states and neural networks, are hard to decipher. To gain more insight into neural computations underlying ongoing binary decision-making tasks, we consider a neural...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01185

    authors: Bose T,Reina A,Marshall JAR

    更新日期:2019-05-01 00:00:00

  • Clustering based on gaussian processes.

    abstract::In this letter, we develop a gaussian process model for clustering. The variances of predictive values in gaussian processes learned from a training data are shown to comprise an estimate of the support of a probability density function. The constructed variance function is then applied to construct a set of contours ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2007.19.11.3088

    authors: Kim HC,Lee J

    更新日期:2007-11-01 00:00:00

  • Capturing the Forest but Missing the Trees: Microstates Inadequate for Characterizing Shorter-Scale EEG Dynamics.

    abstract::The brain is known to be active even when not performing any overt cognitive tasks, and often it engages in involuntary mind wandering. This resting state has been extensively characterized in terms of fMRI-derived brain networks. However, an alternate method has recently gained popularity: EEG microstate analysis. Pr...

    journal_title:Neural computation

    pub_type: 信件

    doi:10.1162/neco_a_01229

    authors: Shaw SB,Dhindsa K,Reilly JP,Becker S

    更新日期:2019-11-01 00:00:00

  • Sufficient dimension reduction via squared-loss mutual information estimation.

    abstract::The goal of sufficient dimension reduction in supervised learning is to find the low-dimensional subspace of input features that contains all of the information about the output values that the input features possess. In this letter, we propose a novel sufficient dimension-reduction method using a squared-loss variant...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00407

    authors: Suzuki T,Sugiyama M

    更新日期:2013-03-01 00:00:00

  • Approximation by fully complex multilayer perceptrons.

    abstract::We investigate the approximation ability of a multilayer perceptron (MLP) network when it is extended to the complex domain. The main challenge for processing complex data with neural networks has been the lack of bounded and analytic complex nonlinear activation functions in the complex domain, as stated by Liouville...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976603321891846

    authors: Kim T,Adali T

    更新日期:2003-07-01 00:00:00

  • An oscillatory Hebbian network model of short-term memory.

    abstract::Recurrent neural architectures having oscillatory dynamics use rhythmic network activity to represent patterns stored in short-term memory. Multiple stored patterns can be retained in memory over the same neural substrate because the network's state persistently switches between them. Here we present a simple oscillat...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2008.02-08-715

    authors: Winder RK,Reggia JA,Weems SA,Bunting MF

    更新日期:2009-03-01 00:00:00

  • Optimality of Upper-Arm Reaching Trajectories Based on the Expected Value of the Metabolic Energy Cost.

    abstract::When we move our body to perform a movement task, our central nervous system selects a movement trajectory from an infinite number of possible trajectories under constraints that have been acquired through evolution and learning. Minimization of the energy cost has been suggested as a potential candidate for a constra...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00757

    authors: Taniai Y,Nishii J

    更新日期:2015-08-01 00:00:00

  • The successor representation and temporal context.

    abstract::The successor representation was introduced into reinforcement learning by Dayan ( 1993 ) as a means of facilitating generalization between states with similar successors. Although reinforcement learning in general has been used extensively as a model of psychological and neural processes, the psychological validity o...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00282

    authors: Gershman SJ,Moore CD,Todd MT,Norman KA,Sederberg PB

    更新日期:2012-06-01 00:00:00

  • Rapid processing and unsupervised learning in a model of the cortical macrocolumn.

    abstract::We study a model of the cortical macrocolumn consisting of a collection of inhibitorily coupled minicolumns. The proposed system overcomes several severe deficits of systems based on single neurons as cerebral functional units, notably limited robustness to damage and unrealistically large computation time. Motivated ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976604772744893

    authors: Lücke J,von der Malsburg C

    更新日期:2004-03-01 00:00:00

  • A bio-inspired, computational model suggests velocity gradients of optic flow locally encode ordinal depth at surface borders and globally they encode self-motion.

    abstract::Visual navigation requires the estimation of self-motion as well as the segmentation of objects from the background. We suggest a definition of local velocity gradients to compute types of self-motion, segment objects, and compute local properties of optical flow fields, such as divergence, curl, and shear. Such veloc...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00479

    authors: Raudies F,Ringbauer S,Neumann H

    更新日期:2013-09-01 00:00:00

  • Universal approximation depth and errors of narrow belief networks with discrete units.

    abstract::We generalize recent theoretical work on the minimal number of layers of narrow deep belief networks that can approximate any probability distribution on the states of their visible units arbitrarily well. We relax the setting of binary units (Sutskever & Hinton, 2008 ; Le Roux & Bengio, 2008 , 2010 ; Montúfar & Ay, 2...

    journal_title:Neural computation

    pub_type: 信件

    doi:10.1162/NECO_a_00601

    authors: Montúfar GF

    更新日期:2014-07-01 00:00:00

  • Bias/Variance Decompositions for Likelihood-Based Estimators.

    abstract::The bias/variance decomposition of mean-squared error is well understood and relatively straightforward. In this note, a similar simple decomposition is derived, valid for any kind of error measure that, when using the appropriate probability model, can be derived from a Kullback-Leibler divergence or log-likelihood. ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976698300017232

    authors: Heskes T

    更新日期:1998-07-28 00:00:00

  • A semiparametric Bayesian model for detecting synchrony among multiple neurons.

    abstract::We propose a scalable semiparametric Bayesian model to capture dependencies among multiple neurons by detecting their cofiring (possibly with some lag time) patterns over time. After discretizing time so there is at most one spike at each interval, the resulting sequence of 1s (spike) and 0s (silence) for each neuron ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00631

    authors: Shahbaba B,Zhou B,Lan S,Ombao H,Moorman D,Behseta S

    更新日期:2014-09-01 00:00:00

  • Higher-order statistics of input ensembles and the response of simple model neurons.

    abstract::Pairwise correlations among spike trains recorded in vivo have been frequently reported. It has been argued that correlated activity could play an important role in the brain, because it efficiently modulates the response of a postsynaptic neuron. We show here that a neuron's output firing rate critically depends on t...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976603321043702

    authors: Kuhn A,Aertsen A,Rotter S

    更新日期:2003-01-01 00:00:00

  • Feature selection in simple neurons: how coding depends on spiking dynamics.

    abstract::The relationship between a neuron's complex inputs and its spiking output defines the neuron's coding strategy. This is frequently and effectively modeled phenomenologically by one or more linear filters that extract the components of the stimulus that are relevant for triggering spikes and a nonlinear function that r...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.02-09-956

    authors: Famulare M,Fairhall A

    更新日期:2010-03-01 00:00:00

  • Reinforcement learning in continuous time and space.

    abstract::This article presents a reinforcement learning framework for continuous-time dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman (HJB) equation for infinite-horizon, discounted reward problems, we derive algorithms for estimating value functions and improv...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976600300015961

    authors: Doya K

    更新日期:2000-01-01 00:00:00

  • The Discriminative Kalman Filter for Bayesian Filtering with Nonlinear and Nongaussian Observation Models.

    abstract::The Kalman filter provides a simple and efficient algorithm to compute the posterior distribution for state-space models where both the latent state and measurement models are linear and gaussian. Extensions to the Kalman filter, including the extended and unscented Kalman filters, incorporate linearizations for model...

    journal_title:Neural computation

    pub_type: 信件

    doi:10.1162/neco_a_01275

    authors: Burkhart MC,Brandman DM,Franco B,Hochberg LR,Harrison MT

    更新日期:2020-05-01 00:00:00

  • Hebbian learning of recurrent connections: a geometrical perspective.

    abstract::We show how a Hopfield network with modifiable recurrent connections undergoing slow Hebbian learning can extract the underlying geometry of an input space. First, we use a slow and fast analysis to derive an averaged system whose dynamics derives from an energy function and therefore always converges to equilibrium p...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00322

    authors: Galtier MN,Faugeras OD,Bressloff PC

    更新日期:2012-09-01 00:00:00

  • Changes in GABAB modulation during a theta cycle may be analogous to the fall of temperature during annealing.

    abstract::Changes in GABA modulation may underlie experimentally observed changes in the strength of synaptic transmission at different phases of the theta rhythm (Wyble, Linster, & Hasselmo, 1997). Analysis demonstrates that these changes improve sequence disambiguation by a neural network model of CA3. We show that in the fra...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976698300017539

    authors: Sohal VS,Hasselmo ME

    更新日期:1998-05-15 00:00:00

  • The relationship between synchronization among neuronal populations and their mean activity levels.

    abstract::In the past decade the importance of synchronized dynamics in the brain has emerged from both empirical and theoretical perspectives. Fast dynamic synchronous interactions of an oscillatory or nonoscillatory nature may constitute a form of temporal coding that underlies feature binding and perceptual synthesis. The re...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976699300016287

    authors: Chawla D,Lumer ED,Friston KJ

    更新日期:1999-08-15 00:00:00

  • Sequential Tests for Large-Scale Learning.

    abstract::We argue that when faced with big data sets, learning and inference algorithms should compute updates using only subsets of data items. We introduce algorithms that use sequential hypothesis tests to adaptively select such a subset of data points. The statistical properties of this subsampling process can be used to c...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00796

    authors: Korattikara A,Chen Y,Welling M

    更新日期:2016-01-01 00:00:00

  • Independent component analysis: A flexible nonlinearity and decorrelating manifold approach.

    abstract::Independent component analysis (ICA) finds a linear transformation to variables that are maximally statistically independent. We examine ICA and algorithms for finding the best transformation from the point of view of maximizing the likelihood of the data. In particular, we discuss the way in which scaling of the unmi...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976699300016043

    authors: Everson R,Roberts S

    更新日期:1999-11-15 00:00:00