Abstract:
:Most conventional policy gradient reinforcement learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the policy parameter. That term involves the derivative of the stationary state distribution that corresponds to the sensitivity of its distribution to changes in the policy parameter. Although the bias introduced by this omission can be reduced by setting the forgetting rate gamma for the value functions close to 1, these algorithms do not permit gamma to be set exactly at gamma = 1. In this article, we propose a method for estimating the log stationary state distribution derivative (LSD) as a useful form of the derivative of the stationary state distribution through backward Markov chain formulation and a temporal difference learning framework. A new policy gradient (PG) framework with an LSD is also proposed, in which the average reward gradient can be estimated by setting gamma = 0, so it becomes unnecessary to learn the value functions. We also test the performance of the proposed algorithms using simple benchmark tasks and show that these can improve the performances of existing PG methods.
journal_name
Neural Computjournal_title
Neural computationauthors
Morimura T,Uchibe E,Yoshimoto J,Peters J,Doya Kdoi
10.1162/neco.2009.12-08-922subject
Has Abstractpub_date
2010-02-01 00:00:00pages
342-76issue
2eissn
0899-7667issn
1530-888Xjournal_volume
22pub_type
杂志文章abstract::Recent studies have employed simple linear dynamical systems to model trial-by-trial dynamics in various sensorimotor learning tasks. Here we explore the theoretical and practical considerations that arise when employing the general class of linear dynamical systems (LDS) as a model for sensorimotor learning. In this ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976606775774651
更新日期:2006-04-01 00:00:00
abstract::In this letter, we investigate the fundamental limits on how the interspike time of a neuron oscillator can be perturbed by the application of a bounded external control input (a current stimulus) with zero net electric charge accumulation. We use phase models to study the dynamics of neurons and derive charge-balance...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00643
更新日期:2014-10-01 00:00:00
abstract::We have created a network that allocates a new computational unit whenever an unusual pattern is presented to the network. This network forms compact representations, yet learns easily and rapidly. The network can be used at any time in the learning process and the learning patterns do not have to be repeated. The uni...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.1991.3.2.213
更新日期:1991-07-01 00:00:00
abstract::The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay. Nevertheless, the training of MLPs suffers from drawbacks like the nonconvex optimization problem and the choice of the number of hidden units. In support vector machines (SVMs) for class...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976602753633411
更新日期:2002-05-01 00:00:00
abstract::Changes in GABA modulation may underlie experimentally observed changes in the strength of synaptic transmission at different phases of the theta rhythm (Wyble, Linster, & Hasselmo, 1997). Analysis demonstrates that these changes improve sequence disambiguation by a neural network model of CA3. We show that in the fra...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976698300017539
更新日期:1998-05-15 00:00:00
abstract::In a biological nervous system, astrocytes play an important role in the functioning and interaction of neurons, and astrocytes have excitatory and inhibitory influence on synapses. In this work, with this biological inspiration, a class of computation devices that consist of neurons and astrocytes is introduced, call...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00238
更新日期:2012-03-01 00:00:00
abstract::A stochastic model of spike-timing-dependent plasticity (STDP) postulates that single synapses presented with a single spike pair exhibit all-or-none quantal jumps in synaptic strength. The amplitudes of the jumps are independent of spiking timing, but their probabilities do depend on spiking timing. By making the amp...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2009.07-08-814
更新日期:2010-01-01 00:00:00
abstract::We study the expressive power of positive neural networks. The model uses positive connection weights and multiple input neurons. Different behaviors can be expressed by varying the connection weights. We show that in discrete time and in the absence of noise, the class of positive neural networks captures the so-call...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00789
更新日期:2015-12-01 00:00:00
abstract::We propose a new principle for replicating receptive field properties of neurons in the primary visual cortex. We derive a learning rule for a feedforward network, which maintains a low firing rate for the output neurons (resulting in temporal sparseness) and allows only a small subset of the neurons in the network to...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00341
更新日期:2012-10-01 00:00:00
abstract::The bias/variance decomposition of mean-squared error is well understood and relatively straightforward. In this note, a similar simple decomposition is derived, valid for any kind of error measure that, when using the appropriate probability model, can be derived from a Kullback-Leibler divergence or log-likelihood. ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976698300017232
更新日期:1998-07-28 00:00:00
abstract::Linear-nonlinear (LN) models and their extensions have proven successful in describing transformations from stimuli to spiking responses of neurons in early stages of sensory hierarchies. Neural responses at later stages are highly nonlinear and have generally been better characterized in terms of their decoding perfo...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00890
更新日期:2016-11-01 00:00:00
abstract::We discuss robustness against mislabeling in multiclass labels for classification problems and propose two algorithms of boosting, the normalized Eta-Boost.M and Eta-Boost.M, based on the Eta-divergence. Those two boosting algorithms are closely related to models of mislabeling in which the label is erroneously exchan...
journal_title:Neural computation
pub_type: 信件
doi:10.1162/neco.2007.11-06-400
更新日期:2008-06-01 00:00:00
abstract::Place cells in the rat hippocampus play a key role in creating the animal's internal representation of the world. During active navigation, these cells spike only in discrete locations, together encoding a map of the environment. Electrophysiological recordings have shown that the animal can revisit this map mentally ...
journal_title:Neural computation
pub_type: 信件
doi:10.1162/NECO_a_00840
更新日期:2016-06-01 00:00:00
abstract::In this work, we discuss practical methods for the assessment, comparison, and selection of complex hierarchical Bayesian models. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating expected utilities. Instead of just making a point estimate, it is important ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/08997660260293292
更新日期:2002-10-01 00:00:00
abstract::In this note, we demonstrate that the high firing irregularity produced by the leaky integrate-and-fire neuron with the partial somatic reset mechanism, which has been shown to be the most likely candidate to reflect the mechanism used in the brain for reproducing the highly irregular cortical neuron firing at high ra...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00090
更新日期:2011-03-01 00:00:00
abstract::This article presents a new theoretical framework to consider the dynamics of a stochastic spiking neuron model with general membrane response to input spike. We assume that the input spikes obey an inhomogeneous Poisson process. The stochastic process of the membrane potential then becomes a gaussian process. When a ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976601317098529
更新日期:2001-12-01 00:00:00
abstract::Binocular fusion takes place over a limited region smaller than one degree of visual angle (Panum's fusional area), which is on the order of the range of preferred disparities measured in populations of disparity-tuned neurons in the visual cortex. However, the actual range of binocular disparities encountered in natu...
journal_title:Neural computation
pub_type: 信件
doi:10.1162/neco.2008.05-07-532
更新日期:2008-10-01 00:00:00
abstract::We consider the effect of the effective timing of a delayed feedback on the excitatory neuron in a recurrent inhibitory loop, when biological realities of firing and absolute refractory period are incorporated into a phenomenological spiking linear or quadratic integrate-and-fire neuron model. We show that such models...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2007.19.8.2124
更新日期:2007-08-01 00:00:00
abstract::Inspired by recent studies regarding dendritic computation, we constructed a recurrent neural network model incorporating dendritic lateral inhibition. Our model consists of an input layer and a neuron layer that includes excitatory cells and an inhibitory cell; this inhibitory cell is activated by the pooled activiti...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2007.19.7.1798
更新日期:2007-07-01 00:00:00
abstract::Numerous animal behaviors, such as locomotion in vertebrates, are produced by rhythmic contractions that alternate between two muscle groups. The neuronal networks generating such alternate rhythmic activity are generally thought to rely on pacemaker cells or well-designed circuits consisting of inhibitory and excitat...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976698300017449
更新日期:1998-07-01 00:00:00
abstract::We simulate the inhibition of Ia-glutamatergic excitatory postsynaptic potential (EPSP) by preceding it with glycinergic recurrent (REN) and reciprocal (REC) inhibitory postsynaptic potentials (IPSPs). The inhibition is evaluated in the presence of voltage-dependent conductances of sodium, delayed rectifier potassium,...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00375
更新日期:2013-01-01 00:00:00
abstract::We study the emergence of synchronized burst activity in networks of neurons with spike adaptation. We show that networks of tonically firing adapting excitatory neurons can evolve to a state where the neurons burst in a synchronized manner. The mechanism leading to this burst activity is analyzed in a network of inte...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/08997660151134280
更新日期:2001-05-01 00:00:00
abstract::A simple associationist neural network learns to factor abstract rules (i.e., grammars) from sequences of arbitrary input symbols by inventing abstract representations that accommodate unseen symbol sets as well as unseen but similar grammars. The neural network is shown to have the ability to transfer grammatical kno...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976602320264079
更新日期:2002-09-01 00:00:00
abstract::How do multiple feature maps that coexist in the same region of cerebral cortex align with each other? We hypothesize that such alignment is governed by temporal correlations: features in one map that are temporally correlated with those in another come to occupy the same spatial locations in cortex over time. To exam...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.1996.8.4.731
更新日期:1996-05-15 00:00:00
abstract::We argue that when faced with big data sets, learning and inference algorithms should compute updates using only subsets of data items. We introduce algorithms that use sequential hypothesis tests to adaptively select such a subset of data points. The statistical properties of this subsampling process can be used to c...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00796
更新日期:2016-01-01 00:00:00
abstract::We derive solutions for the problem of missing and noisy data in nonlinear time&hyphenseries prediction from a probabilistic point of view. We discuss different approximations to the solutions &hyphen in particular, approximations that require either stochastic simulation or the substitution of a single estimate for t...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976698300017728
更新日期:1998-03-23 00:00:00
abstract::The goal of sufficient dimension reduction in supervised learning is to find the low-dimensional subspace of input features that contains all of the information about the output values that the input features possess. In this letter, we propose a novel sufficient dimension-reduction method using a squared-loss variant...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00407
更新日期:2013-03-01 00:00:00
abstract::In this review, we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spike-timing-dependent plasticity (STDP). This review introduces the most influential models and focuses on two questions: To what degree are reward...
journal_title:Neural computation
pub_type: 杂志文章,评审
doi:10.1162/0899766053011555
更新日期:2005-02-01 00:00:00
abstract::We study active learning (AL) based on gaussian processes (GPs) for efficiently enumerating all of the local minimum solutions of a black-box function. This problem is challenging because local solutions are characterized by their zero gradient and positive-definite Hessian properties, but those derivatives cannot be ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco_a_01307
更新日期:2020-10-01 00:00:00
abstract::Ordinal classification (also known as ordinal regression) is a supervised learning task that consists of estimating the rating of a data item on a fixed, discrete rating scale. This problem is receiving increased attention from the sentiment analysis and opinion mining community due to the importance of automatically ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00558
更新日期:2014-03-01 00:00:00