Machine Learning: Deepest Learning as Statistical Data Assimilation Problems.

Abstract:

:We formulate an equivalence between machine learning and the formulation of statistical data assimilation as used widely in physical and biological sciences. The correspondence is that layer number in a feedforward artificial network setting is the analog of time in the data assimilation setting. This connection has been noted in the machine learning literature. We add a perspective that expands on how methods from statistical physics and aspects of Lagrangian and Hamiltonian dynamics play a role in how networks can be trained and designed. Within the discussion of this equivalence, we show that adding more layers (making the network deeper) is analogous to adding temporal resolution in a data assimilation framework. Extending this equivalence to recurrent networks is also discussed. We explore how one can find a candidate for the global minimum of the cost functions in the machine learning context using a method from data assimilation. Calculations on simple models from both sides of the equivalence are reported. Also discussed is a framework in which the time or layer label is taken to be continuous, providing a differential equation, the Euler-Lagrange equation and its boundary conditions, as a necessary condition for a minimum of the cost function. This shows that the problem being solved is a two-point boundary value problem familiar in the discussion of variational methods. The use of continuous layers is denoted "deepest learning." These problems respect a symplectic symmetry in continuous layer phase space. Both Lagrangian versions and Hamiltonian versions of these problems are presented. Their well-studied implementation in a discrete time/layer, while respecting the symplectic structure, is addressed. The Hamiltonian version provides a direct rationale for backpropagation as a solution method for a certain two-point boundary value problem.

journal_name

Neural Comput

journal_title

Neural computation

authors

Abarbanel HDI,Rozdeba PJ,Shirman S

doi

10.1162/neco_a_01094

subject

Has Abstract

pub_date

2018-08-01 00:00:00

pages

2025-2055

issue

8

eissn

0899-7667

issn

1530-888X

journal_volume

30

pub_type

杂志文章
  • Replicating receptive fields of simple and complex cells in primary visual cortex in a neuronal network model with temporal and population sparseness and reliability.

    abstract::We propose a new principle for replicating receptive field properties of neurons in the primary visual cortex. We derive a learning rule for a feedforward network, which maintains a low firing rate for the output neurons (resulting in temporal sparseness) and allows only a small subset of the neurons in the network to...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00341

    authors: Tanaka T,Aoyagi T,Kaneko T

    更新日期:2012-10-01 00:00:00

  • An integral upper bound for neural network approximation.

    abstract::Complexity of one-hidden-layer networks is studied using tools from nonlinear approximation and integration theory. For functions with suitable integral representations in the form of networks with infinitely many hidden units, upper bounds are derived on the speed of decrease of approximation error as the number of n...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.04-08-745

    authors: Kainen PC,Kůrková V

    更新日期:2009-10-01 00:00:00

  • Locality of global stochastic interaction in directed acyclic networks.

    abstract::The hypothesis of invariant maximization of interaction (IMI) is formulated within the setting of random fields. According to this hypothesis, learning processes maximize the stochastic interaction of the neurons subject to constraints. We consider the extrinsic constraint in terms of a fixed input distribution on the...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602760805368

    authors: Ay N

    更新日期:2002-12-01 00:00:00

  • Range-based ICA using a nonsmooth quasi-newton optimizer for electroencephalographic source localization in focal epilepsy.

    abstract::Independent component analysis (ICA) aims at separating a multivariate signal into independent nongaussian signals by optimizing a contrast function with no knowledge on the mixing mechanism. Despite the availability of a constellation of contrast functions, a Hartley-entropy-based ICA contrast endowed with the discri...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00700

    authors: Selvan SE,George ST,Balakrishnan R

    更新日期:2015-03-01 00:00:00

  • The Ornstein-Uhlenbeck process does not reproduce spiking statistics of neurons in prefrontal cortex.

    abstract::Cortical neurons of behaving animals generate irregular spike sequences. Recently, there has been a heated discussion about the origin of this irregularity. Softky and Koch (1993) pointed out the inability of standard single-neuron models to reproduce the irregularity of the observed spike sequences when the model par...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976699300016511

    authors: Shinomoto S,Sakai Y,Funahashi S

    更新日期:1999-05-15 00:00:00

  • Improving generalization performance of natural gradient learning using optimized regularization by NIC.

    abstract::Natural gradient learning is known to be efficient in escaping plateau, which is a main cause of the slow learning speed of neural networks. The adaptive natural gradient learning method for practical implementation also has been developed, and its advantage in real-world problems has been confirmed. In this letter, w...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976604322742065

    authors: Park H,Murata N,Amari S

    更新日期:2004-02-01 00:00:00

  • Extraction of Synaptic Input Properties in Vivo.

    abstract::Knowledge of synaptic input is crucial for understanding synaptic integration and ultimately neural function. However, in vivo, the rates at which synaptic inputs arrive are high, so that it is typically impossible to detect single events. We show here that it is nevertheless possible to extract the properties of the ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00975

    authors: Puggioni P,Jelitai M,Duguid I,van Rossum MCW

    更新日期:2017-07-01 00:00:00

  • Estimating spiking irregularities under changing environments.

    abstract::We considered a gamma distribution of interspike intervals as a statistical model for neuronal spike generation. A gamma distribution is a natural extension of the Poisson process taking the effect of a refractory period into account. The model is specified by two parameters: a time-dependent firing rate and a shape p...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2006.18.10.2359

    authors: Miura K,Okada M,Amari S

    更新日期:2006-10-01 00:00:00

  • On the slow convergence of EM and VBEM in low-noise linear models.

    abstract::We analyze convergence of the expectation maximization (EM) and variational Bayes EM (VBEM) schemes for parameter estimation in noisy linear models. The analysis shows that both schemes are inefficient in the low-noise limit. The linear model with additive noise includes as special cases independent component analysis...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766054322991

    authors: Petersen KB,Winther O,Hansen LK

    更新日期:2005-09-01 00:00:00

  • NMDA Receptor Alterations After Mild Traumatic Brain Injury Induce Deficits in Memory Acquisition and Recall.

    abstract::Mild traumatic brain injury (mTBI) presents a significant health concern with potential persisting deficits that can last decades. Although a growing body of literature improves our understanding of the brain network response and corresponding underlying cellular alterations after injury, the effects of cellular disru...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01343

    authors: Gabrieli D,Schumm SN,Vigilante NF,Meaney DF

    更新日期:2021-01-01 00:00:00

  • Investigating the fault tolerance of neural networks.

    abstract::Particular levels of partial fault tolerance (PFT) in feedforward artificial neural networks of a given size can be obtained by redundancy (replicating a smaller normally trained network), by design (training specifically to increase PFT), and by a combination of the two (replicating a smaller PFT-trained network). Th...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766053723096

    authors: Tchernev EB,Mulvaney RG,Phatak DS

    更新日期:2005-07-01 00:00:00

  • Regularized neural networks: some convergence rate results.

    abstract::In a recent paper, Poggio and Girosi (1990) proposed a class of neural networks obtained from the theory of regularization. Regularized networks are capable of approximating arbitrarily well any continuous function on a compactum. In this paper we consider in detail the learning problem for the one-dimensional case. W...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1995.7.6.1225

    authors: Corradi V,White H

    更新日期:1995-11-01 00:00:00

  • Enhanced stimulus encoding capabilities with spectral selectivity in inhibitory circuits by STDP.

    abstract::The ability to encode and transmit a signal is an essential property that must demonstrate many neuronal circuits in sensory areas in addition to any processing they may provide. It is known that an appropriate level of lateral inhibition, as observed in these areas, can significantly improve the encoding ability of a...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00100

    authors: Coulon A,Beslon G,Soula HA

    更新日期:2011-04-01 00:00:00

  • Cortical spatiotemporal dimensionality reduction for visual grouping.

    abstract::The visual systems of many mammals, including humans, are able to integrate the geometric information of visual stimuli and perform cognitive tasks at the first stages of the cortical processing. This is thought to be the result of a combination of mechanisms, which include feature extraction at the single cell level ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00738

    authors: Cocci G,Barbieri D,Citti G,Sarti A

    更新日期:2015-06-01 00:00:00

  • Synaptic runaway in associative networks and the pathogenesis of schizophrenia.

    abstract::Synaptic runaway denotes the formation of erroneous synapses and premature functional decline accompanying activity-dependent learning in neural networks. This work studies synaptic runaway both analytically and numerically in binary-firing associative memory networks. It turns out that synaptic runaway is of fairly m...

    journal_title:Neural computation

    pub_type: 杂志文章,评审

    doi:10.1162/089976698300017836

    authors: Greenstein-Messica A,Ruppin E

    更新日期:1998-02-15 00:00:00

  • Estimating a state-space model from point process observations: a note on convergence.

    abstract::Physiological signals such as neural spikes and heartbeats are discrete events in time, driven by continuous underlying systems. A recently introduced data-driven model to analyze such a system is a state-space model with point process observations, parameters of which and the underlying state sequence are simultaneou...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2010.07-09-1047

    authors: Yuan K,Niranjan M

    更新日期:2010-08-01 00:00:00

  • Nonlinear and noisy extension of independent component analysis: theory and its application to a pitch sensation model.

    abstract::In this letter, we propose a noisy nonlinear version of independent component analysis (ICA). Assuming that the probability density function (p. d. f.) of sources is known, a learning rule is derived based on maximum likelihood estimation (MLE). Our model involves some algorithms of noisy linear ICA (e. g., Bermond & ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766052530866

    authors: Maeda S,Song WJ,Ishii S

    更新日期:2005-01-01 00:00:00

  • Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes.

    abstract::We extend the neural Turing machine (NTM) model into a dynamic neural Turing machine (D-NTM) by introducing trainable address vectors. This addressing scheme maintains for each memory cell two separate vectors, content and address vectors. This allows the D-NTM to learn a wide variety of location-based addressing stra...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01060

    authors: Gulcehre C,Chandar S,Cho K,Bengio Y

    更新日期:2018-04-01 00:00:00

  • Neuronal assembly dynamics in supervised and unsupervised learning scenarios.

    abstract::The dynamic formation of groups of neurons--neuronal assemblies--is believed to mediate cognitive phenomena at many levels, but their detailed operation and mechanisms of interaction are still to be uncovered. One hypothesis suggests that synchronized oscillations underpin their formation and functioning, with a focus...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00502

    authors: Moioli RC,Husbands P

    更新日期:2013-11-01 00:00:00

  • A semiparametric Bayesian model for detecting synchrony among multiple neurons.

    abstract::We propose a scalable semiparametric Bayesian model to capture dependencies among multiple neurons by detecting their cofiring (possibly with some lag time) patterns over time. After discretizing time so there is at most one spike at each interval, the resulting sequence of 1s (spike) and 0s (silence) for each neuron ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00631

    authors: Shahbaba B,Zhou B,Lan S,Ombao H,Moorman D,Behseta S

    更新日期:2014-09-01 00:00:00

  • Employing the zeta-transform to optimize the calculation of the synaptic conductance of NMDA and other synaptic channels in network simulations.

    abstract::Calculation of the total conductance change induced by multiple synapses at a given membrane compartment remains one of the most time-consuming processes in biophysically realistic neural network simulations. Here we show that this calculation can be achieved in a highly efficient way even for multiply converging syna...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976698300017061

    authors: Köhn J,Wörgötter F

    更新日期:1998-10-01 00:00:00

  • ASIC Implementation of a Nonlinear Dynamical Model for Hippocampal Prosthesis.

    abstract::A hippocampal prosthesis is a very large scale integration (VLSI) biochip that needs to be implanted in the biological brain to solve a cognitive dysfunction. In this letter, we propose a novel low-complexity, small-area, and low-power programmable hippocampal neural network application-specific integrated circuit (AS...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01107

    authors: Qiao Z,Han Y,Han X,Xu H,Li WXY,Song D,Berger TW,Cheung RCC

    更新日期:2018-09-01 00:00:00

  • Adaptive Learning Algorithm Convergence in Passive and Reactive Environments.

    abstract::Although the number of artificial neural network and machine learning architectures is growing at an exponential pace, more attention needs to be paid to theoretical guarantees of asymptotic convergence for novel, nonlinear, high-dimensional adaptive learning algorithms. When properly understood, such guarantees can g...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01117

    authors: Golden RM

    更新日期:2018-10-01 00:00:00

  • A neurocomputational model for cocaine addiction.

    abstract::Based on the dopamine hypotheses of cocaine addiction and the assumption of decrement of brain reward system sensitivity after long-term drug exposure, we propose a computational model for cocaine addiction. Utilizing average reward temporal difference reinforcement learning, we incorporate the elevation of basal rewa...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.10-08-882

    authors: Dezfouli A,Piray P,Keramati MM,Ekhtiari H,Lucas C,Mokri A

    更新日期:2009-10-01 00:00:00

  • Hybrid integrate-and-fire model of a bursting neuron.

    abstract::We present a reduction of a Hodgkin-Huxley (HH)--style bursting model to a hybridized integrate-and-fire (IF) formalism based on a thorough bifurcation analysis of the neuron's dynamics. The model incorporates HH--style equations to evolve the subthreshold currents and includes IF mechanisms to characterize spike even...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976603322518768

    authors: Breen BJ,Gerken WC,Butera RJ Jr

    更新日期:2003-12-01 00:00:00

  • Neural Quadratic Discriminant Analysis: Nonlinear Decoding with V1-Like Computation.

    abstract::Linear-nonlinear (LN) models and their extensions have proven successful in describing transformations from stimuli to spiking responses of neurons in early stages of sensory hierarchies. Neural responses at later stages are highly nonlinear and have generally been better characterized in terms of their decoding perfo...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00890

    authors: Pagan M,Simoncelli EP,Rust NC

    更新日期:2016-11-01 00:00:00

  • A computational model for rhythmic and discrete movements in uni- and bimanual coordination.

    abstract::Current research on discrete and rhythmic movements differs in both experimental procedures and theory, despite the ubiquitous overlap between discrete and rhythmic components in everyday behaviors. Models of rhythmic movements usually use oscillatory systems mimicking central pattern generators (CPGs). In contrast, m...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2008.03-08-720

    authors: Ronsse R,Sternad D,Lefèvre P

    更新日期:2009-05-01 00:00:00

  • Why Does Large Batch Training Result in Poor Generalization? A Comprehensive Explanation and a Better Strategy from the Viewpoint of Stochastic Optimization.

    abstract::We present a comprehensive framework of search methods, such as simulated annealing and batch training, for solving nonconvex optimization problems. These methods search a wider range by gradually decreasing the randomness added to the standard gradient descent method. The formulation that we define on the basis of th...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01089

    authors: Takase T,Oyama S,Kurihara M

    更新日期:2018-07-01 00:00:00

  • Conductance-based integrate-and-fire models.

    abstract::A conductance-based model of Na+ and K+ currents underlying action potential generation is introduced by simplifying the quantitative model of Hodgkin and Huxley (HH). If the time course of rate constants can be approximated by a pulse, HH equations can be solved analytically. Pulse-based (PB) models generate action p...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1997.9.3.503

    authors: Destexhe A

    更新日期:1997-04-01 00:00:00

  • Change-based inference in attractor nets: linear analysis.

    abstract::One standard interpretation of networks of cortical neurons is that they form dynamical attractors. Computations such as stimulus estimation are performed by mapping inputs to points on the networks' attractive manifolds. These points represent population codes for the stimulus values. However, this standard interpret...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00051

    authors: Moazzezi R,Dayan P

    更新日期:2010-12-01 00:00:00