Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning.

Abstract:

:Most conventional policy gradient reinforcement learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the policy parameter. That term involves the derivative of the stationary state distribution that corresponds to the sensitivity of its distribution to changes in the policy parameter. Although the bias introduced by this omission can be reduced by setting the forgetting rate gamma for the value functions close to 1, these algorithms do not permit gamma to be set exactly at gamma = 1. In this article, we propose a method for estimating the log stationary state distribution derivative (LSD) as a useful form of the derivative of the stationary state distribution through backward Markov chain formulation and a temporal difference learning framework. A new policy gradient (PG) framework with an LSD is also proposed, in which the average reward gradient can be estimated by setting gamma = 0, so it becomes unnecessary to learn the value functions. We also test the performance of the proposed algorithms using simple benchmark tasks and show that these can improve the performances of existing PG methods.

journal_name

Neural Comput

journal_title

Neural computation

authors

Morimura T,Uchibe E,Yoshimoto J,Peters J,Doya K

doi

10.1162/neco.2009.12-08-922

subject

Has Abstract

pub_date

2010-02-01 00:00:00

pages

342-76

issue

2

eissn

0899-7667

issn

1530-888X

journal_volume

22

pub_type

杂志文章
  • Modeling sensorimotor learning with linear dynamical systems.

    abstract::Recent studies have employed simple linear dynamical systems to model trial-by-trial dynamics in various sensorimotor learning tasks. Here we explore the theoretical and practical considerations that arise when employing the general class of linear dynamical systems (LDS) as a model for sensorimotor learning. In this ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976606775774651

    authors: Cheng S,Sabes PN

    更新日期:2006-04-01 00:00:00

  • Design of charge-balanced time-optimal stimuli for spiking neuron oscillators.

    abstract::In this letter, we investigate the fundamental limits on how the interspike time of a neuron oscillator can be perturbed by the application of a bounded external control input (a current stimulus) with zero net electric charge accumulation. We use phase models to study the dynamics of neurons and derive charge-balance...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00643

    authors: Dasanayake IS,Li JS

    更新日期:2014-10-01 00:00:00

  • A Resource-Allocating Network for Function Interpolation.

    abstract::We have created a network that allocates a new computational unit whenever an unusual pattern is presented to the network. This network forms compact representations, yet learns easily and rapidly. The network can be used at any time in the learning process and the learning patterns do not have to be repeated. The uni...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1991.3.2.213

    authors: Platt J

    更新日期:1991-07-01 00:00:00

  • Bayesian framework for least-squares support vector machine classifiers, gaussian processes, and kernel Fisher discriminant analysis.

    abstract::The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay. Nevertheless, the training of MLPs suffers from drawbacks like the nonconvex optimization problem and the choice of the number of hidden units. In support vector machines (SVMs) for class...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602753633411

    authors: Van Gestel T,Suykens JA,Lanckriet G,Lambrechts A,De Moor B,Vandewalle J

    更新日期:2002-05-01 00:00:00

  • Changes in GABAB modulation during a theta cycle may be analogous to the fall of temperature during annealing.

    abstract::Changes in GABA modulation may underlie experimentally observed changes in the strength of synaptic transmission at different phases of the theta rhythm (Wyble, Linster, & Hasselmo, 1997). Analysis demonstrates that these changes improve sequence disambiguation by a neural network model of CA3. We show that in the fra...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976698300017539

    authors: Sohal VS,Hasselmo ME

    更新日期:1998-05-15 00:00:00

  • Spiking neural P systems with astrocytes.

    abstract::In a biological nervous system, astrocytes play an important role in the functioning and interaction of neurons, and astrocytes have excitatory and inhibitory influence on synapses. In this work, with this biological inspiration, a class of computation devices that consist of neurons and astrocytes is introduced, call...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00238

    authors: Pan L,Wang J,Hoogeboom HJ

    更新日期:2012-03-01 00:00:00

  • Discrete states of synaptic strength in a stochastic model of spike-timing-dependent plasticity.

    abstract::A stochastic model of spike-timing-dependent plasticity (STDP) postulates that single synapses presented with a single spike pair exhibit all-or-none quantal jumps in synaptic strength. The amplitudes of the jumps are independent of spiking timing, but their probabilities do depend on spiking timing. By making the amp...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.07-08-814

    authors: Elliott T

    更新日期:2010-01-01 00:00:00

  • Positive Neural Networks in Discrete Time Implement Monotone-Regular Behaviors.

    abstract::We study the expressive power of positive neural networks. The model uses positive connection weights and multiple input neurons. Different behaviors can be expressed by varying the connection weights. We show that in discrete time and in the absence of noise, the class of positive neural networks captures the so-call...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00789

    authors: Ameloot TJ,Van den Bussche J

    更新日期:2015-12-01 00:00:00

  • Replicating receptive fields of simple and complex cells in primary visual cortex in a neuronal network model with temporal and population sparseness and reliability.

    abstract::We propose a new principle for replicating receptive field properties of neurons in the primary visual cortex. We derive a learning rule for a feedforward network, which maintains a low firing rate for the output neurons (resulting in temporal sparseness) and allows only a small subset of the neurons in the network to...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00341

    authors: Tanaka T,Aoyagi T,Kaneko T

    更新日期:2012-10-01 00:00:00

  • Bias/Variance Decompositions for Likelihood-Based Estimators.

    abstract::The bias/variance decomposition of mean-squared error is well understood and relatively straightforward. In this note, a similar simple decomposition is derived, valid for any kind of error measure that, when using the appropriate probability model, can be derived from a Kullback-Leibler divergence or log-likelihood. ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976698300017232

    authors: Heskes T

    更新日期:1998-07-28 00:00:00

  • Neural Quadratic Discriminant Analysis: Nonlinear Decoding with V1-Like Computation.

    abstract::Linear-nonlinear (LN) models and their extensions have proven successful in describing transformations from stimuli to spiking responses of neurons in early stages of sensory hierarchies. Neural responses at later stages are highly nonlinear and have generally been better characterized in terms of their decoding perfo...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00890

    authors: Pagan M,Simoncelli EP,Rust NC

    更新日期:2016-11-01 00:00:00

  • Robust boosting algorithm against mislabeling in multiclass problems.

    abstract::We discuss robustness against mislabeling in multiclass labels for classification problems and propose two algorithms of boosting, the normalized Eta-Boost.M and Eta-Boost.M, based on the Eta-divergence. Those two boosting algorithms are closely related to models of mislabeling in which the label is erroneously exchan...

    journal_title:Neural computation

    pub_type: 信件

    doi:10.1162/neco.2007.11-06-400

    authors: Takenouchi T,Eguchi S,Murata N,Kanamori T

    更新日期:2008-06-01 00:00:00

  • Maintaining Consistency of Spatial Information in the Hippocampal Network: A Combinatorial Geometry Model.

    abstract::Place cells in the rat hippocampus play a key role in creating the animal's internal representation of the world. During active navigation, these cells spike only in discrete locations, together encoding a map of the environment. Electrophysiological recordings have shown that the animal can revisit this map mentally ...

    journal_title:Neural computation

    pub_type: 信件

    doi:10.1162/NECO_a_00840

    authors: Dabaghian Y

    更新日期:2016-06-01 00:00:00

  • Bayesian model assessment and comparison using cross-validation predictive densities.

    abstract::In this work, we discuss practical methods for the assessment, comparison, and selection of complex hierarchical Bayesian models. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating expected utilities. Instead of just making a point estimate, it is important ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/08997660260293292

    authors: Vehtari A,Lampinen J

    更新日期:2002-10-01 00:00:00

  • Does high firing irregularity enhance learning?

    abstract::In this note, we demonstrate that the high firing irregularity produced by the leaky integrate-and-fire neuron with the partial somatic reset mechanism, which has been shown to be the most likely candidate to reflect the mechanism used in the brain for reproducing the highly irregular cortical neuron firing at high ra...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00090

    authors: Christodoulou C,Cleanthous A

    更新日期:2011-03-01 00:00:00

  • Gaussian process approach to spiking neurons for inhomogeneous Poisson inputs.

    abstract::This article presents a new theoretical framework to consider the dynamics of a stochastic spiking neuron model with general membrane response to input spike. We assume that the input spikes obey an inhomogeneous Poisson process. The stochastic process of the membrane potential then becomes a gaussian process. When a ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976601317098529

    authors: Amemori KI,Ishii S

    更新日期:2001-12-01 00:00:00

  • Normalization enables robust validation of disparity estimates from neural populations.

    abstract::Binocular fusion takes place over a limited region smaller than one degree of visual angle (Panum's fusional area), which is on the order of the range of preferred disparities measured in populations of disparity-tuned neurons in the visual cortex. However, the actual range of binocular disparities encountered in natu...

    journal_title:Neural computation

    pub_type: 信件

    doi:10.1162/neco.2008.05-07-532

    authors: Tsang EK,Shi BE

    更新日期:2008-10-01 00:00:00

  • Multistability in spiking neuron models of delayed recurrent inhibitory loops.

    abstract::We consider the effect of the effective timing of a delayed feedback on the excitatory neuron in a recurrent inhibitory loop, when biological realities of firing and absolute refractory period are incorporated into a phenomenological spiking linear or quadratic integrate-and-fire neuron model. We show that such models...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2007.19.8.2124

    authors: Ma J,Wu J

    更新日期:2007-08-01 00:00:00

  • Selectivity and stability via dendritic nonlinearity.

    abstract::Inspired by recent studies regarding dendritic computation, we constructed a recurrent neural network model incorporating dendritic lateral inhibition. Our model consists of an input layer and a neuron layer that includes excitatory cells and an inhibitory cell; this inhibitory cell is activated by the pooled activiti...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2007.19.7.1798

    authors: Morita K,Okada M,Aihara K

    更新日期:2007-07-01 00:00:00

  • Pattern generation by two coupled time-discrete neural networks with synaptic depression.

    abstract::Numerous animal behaviors, such as locomotion in vertebrates, are produced by rhythmic contractions that alternate between two muscle groups. The neuronal networks generating such alternate rhythmic activity are generally thought to rely on pacemaker cells or well-designed circuits consisting of inhibitory and excitat...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976698300017449

    authors: Senn W,Wannier T,Kleinle J,Lüscher HR,Müller L,Streit J,Wyler K

    更新日期:1998-07-01 00:00:00

  • Statistical computer model analysis of the reciprocal and recurrent inhibitions of the Ia-EPSP in α-motoneurons.

    abstract::We simulate the inhibition of Ia-glutamatergic excitatory postsynaptic potential (EPSP) by preceding it with glycinergic recurrent (REN) and reciprocal (REC) inhibitory postsynaptic potentials (IPSPs). The inhibition is evaluated in the presence of voltage-dependent conductances of sodium, delayed rectifier potassium,...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00375

    authors: Gradwohl G,Grossman Y

    更新日期:2013-01-01 00:00:00

  • Patterns of synchrony in neural networks with spike adaptation.

    abstract::We study the emergence of synchronized burst activity in networks of neurons with spike adaptation. We show that networks of tonically firing adapting excitatory neurons can evolve to a state where the neurons burst in a synchronized manner. The mechanism leading to this burst activity is analyzed in a network of inte...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/08997660151134280

    authors: van Vreeswijk C,Hansel D

    更新日期:2001-05-01 00:00:00

  • On the emergence of rules in neural networks.

    abstract::A simple associationist neural network learns to factor abstract rules (i.e., grammars) from sequences of arbitrary input symbols by inventing abstract representations that accommodate unseen symbol sets as well as unseen but similar grammars. The neural network is shown to have the ability to transfer grammatical kno...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602320264079

    authors: Hanson SJ,Negishi M

    更新日期:2002-09-01 00:00:00

  • Alignment of coexisting cortical maps in a motor control model.

    abstract::How do multiple feature maps that coexist in the same region of cerebral cortex align with each other? We hypothesize that such alignment is governed by temporal correlations: features in one map that are temporally correlated with those in another come to occupy the same spatial locations in cortex over time. To exam...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1996.8.4.731

    authors: Chen Y,Reggia JA

    更新日期:1996-05-15 00:00:00

  • Sequential Tests for Large-Scale Learning.

    abstract::We argue that when faced with big data sets, learning and inference algorithms should compute updates using only subsets of data items. We introduce algorithms that use sequential hypothesis tests to adaptively select such a subset of data points. The statistical properties of this subsampling process can be used to c...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00796

    authors: Korattikara A,Chen Y,Welling M

    更新日期:2016-01-01 00:00:00

  • Nonlinear Time&hyphenSeries Prediction with Missing and Noisy Data

    abstract::We derive solutions for the problem of missing and noisy data in nonlinear time&hyphenseries prediction from a probabilistic point of view. We discuss different approximations to the solutions &hyphen in particular, approximations that require either stochastic simulation or the substitution of a single estimate for t...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976698300017728

    authors: Tresp V V,Hofmann R

    更新日期:1998-03-23 00:00:00

  • Sufficient dimension reduction via squared-loss mutual information estimation.

    abstract::The goal of sufficient dimension reduction in supervised learning is to find the low-dimensional subspace of input features that contains all of the information about the output values that the input features possess. In this letter, we propose a novel sufficient dimension-reduction method using a squared-loss variant...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00407

    authors: Suzuki T,Sugiyama M

    更新日期:2013-03-01 00:00:00

  • Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms.

    abstract::In this review, we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spike-timing-dependent plasticity (STDP). This review introduces the most influential models and focuses on two questions: To what degree are reward...

    journal_title:Neural computation

    pub_type: 杂志文章,评审

    doi:10.1162/0899766053011555

    authors: Wörgötter F,Porr B

    更新日期:2005-02-01 00:00:00

  • Active Learning for Enumerating Local Minima Based on Gaussian Process Derivatives.

    abstract::We study active learning (AL) based on gaussian processes (GPs) for efficiently enumerating all of the local minimum solutions of a black-box function. This problem is challenging because local solutions are characterized by their zero gradient and positive-definite Hessian properties, but those derivatives cannot be ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01307

    authors: Inatsu Y,Sugita D,Toyoura K,Takeuchi I

    更新日期:2020-10-01 00:00:00

  • Feature selection for ordinal text classification.

    abstract::Ordinal classification (also known as ordinal regression) is a supervised learning task that consists of estimating the rating of a data item on a fixed, discrete rating scale. This problem is receiving increased attention from the sentiment analysis and opinion mining community due to the importance of automatically ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00558

    authors: Baccianella S,Esuli A,Sebastiani F

    更新日期:2014-03-01 00:00:00