Improving generalization performance of natural gradient learning using optimized regularization by NIC.

Abstract:

:Natural gradient learning is known to be efficient in escaping plateau, which is a main cause of the slow learning speed of neural networks. The adaptive natural gradient learning method for practical implementation also has been developed, and its advantage in real-world problems has been confirmed. In this letter, we deal with the generalization performances of the natural gradient method. Since natural gradient learning makes parameters fit to training data quickly,the overfitting phenomenon may easily occur, which results in poor generalization performance. To solve the problem, we introduce the regularization term in natural gradient learning and propose an efficient optimizing method for the scale of regularization by using a generalized Akaike information criterion (network information criterion). We discuss the properties of the optimized regularization strength by NIC through theoretical analysis as well as computer simulations. We confirm the computational efficiency and generalization performance of the proposed method in real-world applications through computational experiments on benchmark problems.

journal_name

Neural Comput

journal_title

Neural computation

authors

Park H,Murata N,Amari S

doi

10.1162/089976604322742065

subject

Has Abstract

pub_date

2004-02-01 00:00:00

pages

355-82

issue

2

eissn

0899-7667

issn

1530-888X

journal_volume

16

pub_type

杂志文章
  • ParceLiNGAM: a causal ordering method robust against latent confounders.

    abstract::We consider learning a causal ordering of variables in a linear nongaussian acyclic model called LiNGAM. Several methods have been shown to consistently estimate a causal ordering assuming that all the model assumptions are correct. But the estimation results could be distorted if some assumptions are violated. In thi...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00533

    authors: Tashiro T,Shimizu S,Hyvärinen A,Washio T

    更新日期:2014-01-01 00:00:00

  • Supervised Determined Source Separation with Multichannel Variational Autoencoder.

    abstract::This letter proposes a multichannel source separation technique, the multichannel variational autoencoder (MVAE) method, which uses a conditional VAE (CVAE) to model and estimate the power spectrograms of the sources in a mixture. By training the CVAE using the spectrograms of training examples with source-class label...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01217

    authors: Kameoka H,Li L,Inoue S,Makino S

    更新日期:2019-09-01 00:00:00

  • Inhibition and Excitation Shape Activity Selection: Effect of Oscillations in a Decision-Making Circuit.

    abstract::Decision making is a complex task, and its underlying mechanisms that regulate behavior, such as the implementation of the coupling between physiological states and neural networks, are hard to decipher. To gain more insight into neural computations underlying ongoing binary decision-making tasks, we consider a neural...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01185

    authors: Bose T,Reina A,Marshall JAR

    更新日期:2019-05-01 00:00:00

  • Long-term reward prediction in TD models of the dopamine system.

    abstract::This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602760407973

    authors: Daw ND,Touretzky DS

    更新日期:2002-11-01 00:00:00

  • Information loss in an optimal maximum likelihood decoding.

    abstract::The mutual information between a set of stimuli and the elicited neural responses is compared to the corresponding decoded information. The decoding procedure is presented as an artificial distortion of the joint probabilities between stimuli and responses. The information loss is quantified. Whenever the probabilitie...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602317318947

    authors: Samengo I

    更新日期:2002-04-01 00:00:00

  • Characterization of minimum error linear coding with sensory and neural noise.

    abstract::Robust coding has been proposed as a solution to the problem of minimizing decoding error in the presence of neural noise. Many real-world problems, however, have degradation in the input signal, not just in neural representations. This generalized problem is more relevant to biological sensory coding where internal n...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00181

    authors: Doi E,Lewicki MS

    更新日期:2011-10-01 00:00:00

  • An amplitude equation approach to contextual effects in visual cortex.

    abstract::A mathematical theory of interacting hypercolumns in primary visual cortex (V1) is presented that incorporates details concerning the anisotropic nature of long-range lateral connections. Each hypercolumn is modeled as a ring of interacting excitatory and inhibitory neural populations with orientation preferences over...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602317250870

    authors: Bressloff PC,Cowan JD

    更新日期:2002-03-01 00:00:00

  • On the slow convergence of EM and VBEM in low-noise linear models.

    abstract::We analyze convergence of the expectation maximization (EM) and variational Bayes EM (VBEM) schemes for parameter estimation in noisy linear models. The analysis shows that both schemes are inefficient in the low-noise limit. The linear model with additive noise includes as special cases independent component analysis...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766054322991

    authors: Petersen KB,Winther O,Hansen LK

    更新日期:2005-09-01 00:00:00

  • Modeling slowly bursting neurons via calcium store and voltage-independent calcium current.

    abstract::Recent experiments indicate that the calcium store (e.g., endoplasmic reticulum) is involved in electrical bursting and [Ca2+]i oscillation in bursting neuronal cells. In this paper, we formulate a mathematical model for bursting neurons, which includes Ca2+ in the intracellular Ca2+ stores and a voltage-independent c...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1996.8.5.951

    authors: Chay TR

    更新日期:1996-07-01 00:00:00

  • A graphical model framework for decoding in the visual ERP-based BCI speller.

    abstract::We present a graphical model framework for decoding in the visual ERP-based speller system. The proposed framework allows researchers to build generative models from which the decoding rules are obtained in a straightforward manner. We suggest two models for generating brain signals conditioned on the stimulus events....

    journal_title:Neural computation

    pub_type: 信件

    doi:10.1162/NECO_a_00066

    authors: Martens SM,Mooij JM,Hill NJ,Farquhar J,Schölkopf B

    更新日期:2011-01-01 00:00:00

  • Variations on the Theme of Synaptic Filtering: A Comparison of Integrate-and-Express Models of Synaptic Plasticity for Memory Lifetimes.

    abstract::Integrate-and-express models of synaptic plasticity propose that synapses integrate plasticity induction signals before expressing synaptic plasticity. By discerning trends in their induction signals, synapses can control destabilizing fluctuations in synaptic strength. In a feedforward perceptron framework with binar...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00889

    authors: Elliott T

    更新日期:2016-11-01 00:00:00

  • Bayesian framework for least-squares support vector machine classifiers, gaussian processes, and kernel Fisher discriminant analysis.

    abstract::The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay. Nevertheless, the training of MLPs suffers from drawbacks like the nonconvex optimization problem and the choice of the number of hidden units. In support vector machines (SVMs) for class...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602753633411

    authors: Van Gestel T,Suykens JA,Lanckriet G,Lambrechts A,De Moor B,Vandewalle J

    更新日期:2002-05-01 00:00:00

  • Feature selection for ordinal text classification.

    abstract::Ordinal classification (also known as ordinal regression) is a supervised learning task that consists of estimating the rating of a data item on a fixed, discrete rating scale. This problem is receiving increased attention from the sentiment analysis and opinion mining community due to the importance of automatically ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00558

    authors: Baccianella S,Esuli A,Sebastiani F

    更新日期:2014-03-01 00:00:00

  • Does high firing irregularity enhance learning?

    abstract::In this note, we demonstrate that the high firing irregularity produced by the leaky integrate-and-fire neuron with the partial somatic reset mechanism, which has been shown to be the most likely candidate to reflect the mechanism used in the brain for reproducing the highly irregular cortical neuron firing at high ra...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00090

    authors: Christodoulou C,Cleanthous A

    更新日期:2011-03-01 00:00:00

  • Why Does Large Batch Training Result in Poor Generalization? A Comprehensive Explanation and a Better Strategy from the Viewpoint of Stochastic Optimization.

    abstract::We present a comprehensive framework of search methods, such as simulated annealing and batch training, for solving nonconvex optimization problems. These methods search a wider range by gradually decreasing the randomness added to the standard gradient descent method. The formulation that we define on the basis of th...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01089

    authors: Takase T,Oyama S,Kurihara M

    更新日期:2018-07-01 00:00:00

  • Online adaptive decision trees.

    abstract::Decision trees and neural networks are widely used tools for pattern classification. Decision trees provide highly localized representation, whereas neural networks provide a distributed but compact representation of the decision space. Decision trees cannot be induced in the online mode, and they are not adaptive to ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766041336396

    authors: Basak J

    更新日期:2004-09-01 00:00:00

  • Dependence of neuronal correlations on filter characteristics and marginal spike train statistics.

    abstract::Correlated neural activity has been observed at various signal levels (e.g., spike count, membrane potential, local field potential, EEG, fMRI BOLD). Most of these signals can be considered as superpositions of spike trains filtered by components of the neural system (synapses, membranes) and the measurement process. ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2008.05-07-525

    authors: Tetzlaff T,Rotter S,Stark E,Abeles M,Aertsen A,Diesmann M

    更新日期:2008-09-01 00:00:00

  • Effects of fast presynaptic noise in attractor neural networks.

    abstract::We study both analytically and numerically the effect of presynaptic noise on the transmission of information in attractor neural networks. The noise occurs on a very short timescale compared to that for the neuron dynamics and it produces short-time synaptic depression. This is inspired in recent neurobiological find...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976606775623342

    authors: Cortes JM,Torres JJ,Marro J,Garrido PL,Kappen HJ

    更新日期:2006-03-01 00:00:00

  • Adaptive Learning Algorithm Convergence in Passive and Reactive Environments.

    abstract::Although the number of artificial neural network and machine learning architectures is growing at an exponential pace, more attention needs to be paid to theoretical guarantees of asymptotic convergence for novel, nonlinear, high-dimensional adaptive learning algorithms. When properly understood, such guarantees can g...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01117

    authors: Golden RM

    更新日期:2018-10-01 00:00:00

  • Synchrony and desynchrony in integrate-and-fire oscillators.

    abstract::Due to many experimental reports of synchronous neural activity in the brain, there is much interest in understanding synchronization in networks of neural oscillators and its potential for computing perceptual organization. Contrary to Hopfield and Herz (1995), we find that networks of locally coupled integrate-and-f...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976699300016160

    authors: Campbell SR,Wang DL,Jayaprakash C

    更新日期:1999-10-01 00:00:00

  • Delay Differential Analysis of Seizures in Multichannel Electrocorticography Data.

    abstract::High-density electrocorticogram (ECoG) electrodes are capable of recording neurophysiological data with high temporal resolution with wide spatial coverage. These recordings are a window to understanding how the human brain processes information and subsequently behaves in healthy and pathologic states. Here, we descr...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01009

    authors: Lainscsek C,Weyhenmeyer J,Cash SS,Sejnowski TJ

    更新日期:2017-12-01 00:00:00

  • Learning object representations using a priori constraints within ORASSYLL.

    abstract::In this article, a biologically plausible and efficient object recognition system (called ORASSYLL) is introduced, based on a set of a priori constraints motivated by findings of developmental psychology and neurophysiology. These constraints are concerned with the organization of the input in local and corresponding ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976601300014583

    authors: Krüger N

    更新日期:2001-02-01 00:00:00

  • A Unifying Framework of Synaptic and Intrinsic Plasticity in Neural Populations.

    abstract::A neuronal population is a computational unit that receives a multivariate, time-varying input signal and creates a related multivariate output. These neural signals are modeled as stochastic processes that transmit information in real time, subject to stochastic noise. In a stationary environment, where the input sig...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01057

    authors: Leugering J,Pipa G

    更新日期:2018-04-01 00:00:00

  • Local and global gating of synaptic plasticity.

    abstract::Mechanisms influencing learning in neural networks are usually investigated on either a local or a global scale. The former relates to synaptic processes, the latter to unspecific modulatory systems. Here we study the interaction of a local learning rule that evaluates coincidences of pre- and postsynaptic action pote...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976600300015682

    authors: Sánchez-Montañés MA,Verschure PF,König P

    更新日期:2000-03-01 00:00:00

  • A Distributed Framework for the Construction of Transport Maps.

    abstract::The need to reason about uncertainty in large, complex, and multimodal data sets has become increasingly common across modern scientific environments. The ability to transform samples from one distribution P to another distribution

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01172

    authors: Mesa DA,Tantiongloc J,Mendoza M,Kim S,P Coleman T

    更新日期:2019-04-01 00:00:00

  • Locality of global stochastic interaction in directed acyclic networks.

    abstract::The hypothesis of invariant maximization of interaction (IMI) is formulated within the setting of random fields. According to this hypothesis, learning processes maximize the stochastic interaction of the neurons subject to constraints. We consider the extrinsic constraint in terms of a fixed input distribution on the...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602760805368

    authors: Ay N

    更新日期:2002-12-01 00:00:00

  • A Mathematical Analysis of Memory Lifetime in a Simple Network Model of Memory.

    abstract::We study the learning of an external signal by a neural network and the time to forget it when this network is submitted to noise. The presentation of an external stimulus to the recurrent network of binary neurons may change the state of the synapses. Multiple presentations of a unique signal lead to its learning. Th...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01286

    authors: Helson P

    更新日期:2020-07-01 00:00:00

  • Robustness of connectionist swimming controllers against random variation in neural connections.

    abstract::The ability to achieve high swimming speed and efficiency is very important to both the real lamprey and its robotic implementation. In previous studies, we used evolutionary algorithms to evolve biologically plausible connectionist swimming controllers for a simulated lamprey. This letter investigates the robustness ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2007.19.6.1568

    authors: Or J

    更新日期:2007-06-01 00:00:00

  • A Mean-Field Description of Bursting Dynamics in Spiking Neural Networks with Short-Term Adaptation.

    abstract::Bursting plays an important role in neural communication. At the population level, macroscopic bursting has been identified in populations of neurons that do not express intrinsic bursting mechanisms. For the analysis of phase transitions between bursting and non-bursting states, mean-field descriptions of macroscopic...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01300

    authors: Gast R,Schmidt H,Knösche TR

    更新日期:2020-09-01 00:00:00

  • Bias/Variance Decompositions for Likelihood-Based Estimators.

    abstract::The bias/variance decomposition of mean-squared error is well understood and relatively straightforward. In this note, a similar simple decomposition is derived, valid for any kind of error measure that, when using the appropriate probability model, can be derived from a Kullback-Leibler divergence or log-likelihood. ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976698300017232

    authors: Heskes T

    更新日期:1998-07-28 00:00:00