Online Reinforcement Learning Using a Probability Density Estimation.

Abstract:

:Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.

journal_name

Neural Comput

journal_title

Neural computation

authors

Agostini A,Celaya E

doi

10.1162/NECO_a_00906

subject

Has Abstract

pub_date

2017-01-01 00:00:00

pages

220-246

issue

1

eissn

0899-7667

issn

1530-888X

journal_volume

29

pub_type

杂志文章
  • Neural coding: higher-order temporal patterns in the neurostatistics of cell assemblies.

    abstract::Recent advances in the technology of multiunit recordings make it possible to test Hebb's hypothesis that neurons do not function in isolation but are organized in assemblies. This has created the need for statistical approaches to detecting the presence of spatiotemporal patterns of more than two neurons in neuron sp...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976600300014872

    authors: Martignon L,Deco G,Laskey K,Diamond M,Freiwald W,Vaadia E

    更新日期:2000-11-01 00:00:00

  • Sufficient dimension reduction via squared-loss mutual information estimation.

    abstract::The goal of sufficient dimension reduction in supervised learning is to find the low-dimensional subspace of input features that contains all of the information about the output values that the input features possess. In this letter, we propose a novel sufficient dimension-reduction method using a squared-loss variant...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00407

    authors: Suzuki T,Sugiyama M

    更新日期:2013-03-01 00:00:00

  • Bayesian framework for least-squares support vector machine classifiers, gaussian processes, and kernel Fisher discriminant analysis.

    abstract::The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay. Nevertheless, the training of MLPs suffers from drawbacks like the nonconvex optimization problem and the choice of the number of hidden units. In support vector machines (SVMs) for class...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602753633411

    authors: Van Gestel T,Suykens JA,Lanckriet G,Lambrechts A,De Moor B,Vandewalle J

    更新日期:2002-05-01 00:00:00

  • Neural Quadratic Discriminant Analysis: Nonlinear Decoding with V1-Like Computation.

    abstract::Linear-nonlinear (LN) models and their extensions have proven successful in describing transformations from stimuli to spiking responses of neurons in early stages of sensory hierarchies. Neural responses at later stages are highly nonlinear and have generally been better characterized in terms of their decoding perfo...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00890

    authors: Pagan M,Simoncelli EP,Rust NC

    更新日期:2016-11-01 00:00:00

  • Attractive periodic sets in discrete-time recurrent networks (with emphasis on fixed-point stability and bifurcations in two-neuron networks).

    abstract::We perform a detailed fixed-point analysis of two-unit recurrent neural networks with sigmoid-shaped transfer functions. Using geometrical arguments in the space of transfer function derivatives, we partition the network state-space into distinct regions corresponding to stability types of the fixed points. Unlike in ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/08997660152002898

    authors: Tino P,Horne BG,Giles CL

    更新日期:2001-06-01 00:00:00

  • A Reservoir Computing Model of Reward-Modulated Motor Learning and Automaticity.

    abstract::Reservoir computing is a biologically inspired class of learning algorithms in which the intrinsic dynamics of a recurrent neural network are mined to produce target time series. Most existing reservoir computing algorithms rely on fully supervised learning rules, which require access to an exact copy of the target re...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01198

    authors: Pyle R,Rosenbaum R

    更新日期:2019-07-01 00:00:00

  • Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms.

    abstract::In this review, we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spike-timing-dependent plasticity (STDP). This review introduces the most influential models and focuses on two questions: To what degree are reward...

    journal_title:Neural computation

    pub_type: 杂志文章,评审

    doi:10.1162/0899766053011555

    authors: Wörgötter F,Porr B

    更新日期:2005-02-01 00:00:00

  • Conditional density estimation with dimensionality reduction via squared-loss conditional entropy minimization.

    abstract::Regression aims at estimating the conditional mean of output given input. However, regression is not informative enough if the conditional density is multimodal, heteroskedastic, and asymmetric. In such a case, estimating the conditional density itself is preferable, but conditional density estimation (CDE) is challen...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00683

    authors: Tangkaratt V,Xie N,Sugiyama M

    更新日期:2015-01-01 00:00:00

  • Regulation of ambient GABA levels by neuron-glia signaling for reliable perception of multisensory events.

    abstract::Activities of sensory-specific cortices are known to be suppressed when presented with a different sensory modality stimulus. This is referred to as cross-modal inhibition, for which the conventional synaptic mechanism is unlikely to work. Interestingly, the cross-modal inhibition could be eliminated when presented wi...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00356

    authors: Hoshino O

    更新日期:2012-11-01 00:00:00

  • Regularized neural networks: some convergence rate results.

    abstract::In a recent paper, Poggio and Girosi (1990) proposed a class of neural networks obtained from the theory of regularization. Regularized networks are capable of approximating arbitrarily well any continuous function on a compactum. In this paper we consider in detail the learning problem for the one-dimensional case. W...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1995.7.6.1225

    authors: Corradi V,White H

    更新日期:1995-11-01 00:00:00

  • Synchrony and desynchrony in integrate-and-fire oscillators.

    abstract::Due to many experimental reports of synchronous neural activity in the brain, there is much interest in understanding synchronization in networks of neural oscillators and its potential for computing perceptual organization. Contrary to Hopfield and Herz (1995), we find that networks of locally coupled integrate-and-f...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976699300016160

    authors: Campbell SR,Wang DL,Jayaprakash C

    更新日期:1999-10-01 00:00:00

  • Invariant global motion recognition in the dorsal visual system: a unifying theory.

    abstract::The motion of an object (such as a wheel rotating) is seen as consistent independent of its position and size on the retina. Neurons in higher cortical visual areas respond to these global motion stimuli invariantly, but neurons in early cortical areas with small receptive fields cannot represent this motion, not only...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2007.19.1.139

    authors: Rolls ET,Stringer SM

    更新日期:2007-01-01 00:00:00

  • The relationship between synchronization among neuronal populations and their mean activity levels.

    abstract::In the past decade the importance of synchronized dynamics in the brain has emerged from both empirical and theoretical perspectives. Fast dynamic synchronous interactions of an oscillatory or nonoscillatory nature may constitute a form of temporal coding that underlies feature binding and perceptual synthesis. The re...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976699300016287

    authors: Chawla D,Lumer ED,Friston KJ

    更新日期:1999-08-15 00:00:00

  • Optimal sequential detection of stimuli from multiunit recordings taken in densely populated brain regions.

    abstract::We address the problem of detecting the presence of a recurring stimulus by monitoring the voltage on a multiunit electrode located in a brain region densely populated by stimulus reactive neurons. Published experimental results suggest that under these conditions, when a stimulus is present, the measurements are gaus...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00257

    authors: Nossenson N,Messer H

    更新日期:2012-04-01 00:00:00

  • Supervised Determined Source Separation with Multichannel Variational Autoencoder.

    abstract::This letter proposes a multichannel source separation technique, the multichannel variational autoencoder (MVAE) method, which uses a conditional VAE (CVAE) to model and estimate the power spectrograms of the sources in a mixture. By training the CVAE using the spectrograms of training examples with source-class label...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01217

    authors: Kameoka H,Li L,Inoue S,Makino S

    更新日期:2019-09-01 00:00:00

  • Effects of fast presynaptic noise in attractor neural networks.

    abstract::We study both analytically and numerically the effect of presynaptic noise on the transmission of information in attractor neural networks. The noise occurs on a very short timescale compared to that for the neuron dynamics and it produces short-time synaptic depression. This is inspired in recent neurobiological find...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976606775623342

    authors: Cortes JM,Torres JJ,Marro J,Garrido PL,Kappen HJ

    更新日期:2006-03-01 00:00:00

  • Independent component analysis: A flexible nonlinearity and decorrelating manifold approach.

    abstract::Independent component analysis (ICA) finds a linear transformation to variables that are maximally statistically independent. We examine ICA and algorithms for finding the best transformation from the point of view of maximizing the likelihood of the data. In particular, we discuss the way in which scaling of the unmi...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976699300016043

    authors: Everson R,Roberts S

    更新日期:1999-11-15 00:00:00

  • An amplitude equation approach to contextual effects in visual cortex.

    abstract::A mathematical theory of interacting hypercolumns in primary visual cortex (V1) is presented that incorporates details concerning the anisotropic nature of long-range lateral connections. Each hypercolumn is modeled as a ring of interacting excitatory and inhibitory neural populations with orientation preferences over...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602317250870

    authors: Bressloff PC,Cowan JD

    更新日期:2002-03-01 00:00:00

  • Synaptic runaway in associative networks and the pathogenesis of schizophrenia.

    abstract::Synaptic runaway denotes the formation of erroneous synapses and premature functional decline accompanying activity-dependent learning in neural networks. This work studies synaptic runaway both analytically and numerically in binary-firing associative memory networks. It turns out that synaptic runaway is of fairly m...

    journal_title:Neural computation

    pub_type: 杂志文章,评审

    doi:10.1162/089976698300017836

    authors: Greenstein-Messica A,Ruppin E

    更新日期:1998-02-15 00:00:00

  • Online adaptive decision trees.

    abstract::Decision trees and neural networks are widely used tools for pattern classification. Decision trees provide highly localized representation, whereas neural networks provide a distributed but compact representation of the decision space. Decision trees cannot be induced in the online mode, and they are not adaptive to ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766041336396

    authors: Basak J

    更新日期:2004-09-01 00:00:00

  • Scalable Semisupervised Functional Neurocartography Reveals Canonical Neurons in Behavioral Networks.

    abstract::Large-scale data collection efforts to map the brain are underway at multiple spatial and temporal scales, but all face fundamental problems posed by high-dimensional data and intersubject variability. Even seemingly simple problems, such as identifying a neuron/brain region across animals/subjects, become exponential...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00852

    authors: Frady EP,Kapoor A,Horvitz E,Kristan WB Jr

    更新日期:2016-08-01 00:00:00

  • Analysis of cluttered scenes using an elastic matching approach for stereo images.

    abstract::We present a system for the automatic interpretation of cluttered scenes containing multiple partly occluded objects in front of unknown, complex backgrounds. The system is based on an extended elastic graph matching algorithm that allows the explicit modeling of partial occlusions. Our approach extends an earlier sys...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2006.18.6.1441

    authors: Eckes C,Triesch J,von der Malsburg C

    更新日期:2006-06-01 00:00:00

  • Pattern generation by two coupled time-discrete neural networks with synaptic depression.

    abstract::Numerous animal behaviors, such as locomotion in vertebrates, are produced by rhythmic contractions that alternate between two muscle groups. The neuronal networks generating such alternate rhythmic activity are generally thought to rely on pacemaker cells or well-designed circuits consisting of inhibitory and excitat...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976698300017449

    authors: Senn W,Wannier T,Kleinle J,Lüscher HR,Müller L,Streit J,Wyler K

    更新日期:1998-07-01 00:00:00

  • ISO learning approximates a solution to the inverse-controller problem in an unsupervised behavioral paradigm.

    abstract::In "Isotropic Sequence Order Learning" (pp. 831-864 in this issue), we introduced a novel algorithm for temporal sequence learning (ISO learning). Here, we embed this algorithm into a formal nonevaluating (teacher free) environment, which establishes a sensor-motor feedback. The system is initially guided by a fixed r...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/08997660360581930

    authors: Porr B,von Ferber C,Wörgötter F

    更新日期:2003-04-01 00:00:00

  • Sequential Tests for Large-Scale Learning.

    abstract::We argue that when faced with big data sets, learning and inference algorithms should compute updates using only subsets of data items. We introduce algorithms that use sequential hypothesis tests to adaptively select such a subset of data points. The statistical properties of this subsampling process can be used to c...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00796

    authors: Korattikara A,Chen Y,Welling M

    更新日期:2016-01-01 00:00:00

  • A causal perspective on the analysis of signal and noise correlations and their role in population coding.

    abstract::The role of correlations between neuronal responses is crucial to understanding the neural code. A framework used to study this role comprises a breakdown of the mutual information between stimuli and responses into terms that aim to account for different coding modalities and the distinction between different notions...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00588

    authors: Chicharro D

    更新日期:2014-06-01 00:00:00

  • Feature selection in simple neurons: how coding depends on spiking dynamics.

    abstract::The relationship between a neuron's complex inputs and its spiking output defines the neuron's coding strategy. This is frequently and effectively modeled phenomenologically by one or more linear filters that extract the components of the stimulus that are relevant for triggering spikes and a nonlinear function that r...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.02-09-956

    authors: Famulare M,Fairhall A

    更新日期:2010-03-01 00:00:00

  • Oscillating Networks: Control of Burst Duration by Electrically Coupled Neurons.

    abstract::The pyloric network of the stomatogastric ganglion in crustacea is a central pattern generator that can produce the same basic rhythm over a wide frequency range. Three electrically coupled neurons, the anterior burster (AB) neuron and two pyloric dilator (PD) neurons, act as a pacemaker unit for the pyloric network. ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1991.3.4.487

    authors: Abbott LF,Marder E,Hooper SL

    更新日期:1991-01-01 00:00:00

  • Competition between synaptic depression and facilitation in attractor neural networks.

    abstract::We study the effect of competition between short-term synaptic depression and facilitation on the dynamic properties of attractor neural networks, using Monte Carlo simulation and a mean-field analysis. Depending on the balance of depression, facilitation, and the underlying noise, the network displays different behav...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2007.19.10.2739

    authors: Torres JJ,Cortes JM,Marro J,Kappen HJ

    更新日期:2007-10-01 00:00:00

  • Toward a biophysically plausible bidirectional Hebbian rule.

    abstract::Although the commonly used quadratic Hebbian-anti-Hebbian rules lead to successful models of plasticity and learning, they are inconsistent with neurophysiology. Other rules, more physiologically plausible, fail to specify the biological mechanism of bidirectionality and the biological mechanism that prevents synapses...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976698300017629

    authors: Grzywacz NM,Burgi PY

    更新日期:1998-04-01 00:00:00