Abstract:
:This article presents a reinforcement learning framework for continuous-time dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman (HJB) equation for infinite-horizon, discounted reward problems, we derive algorithms for estimating value functions and improving policies with the use of function approximators. The process of value function estimation is formulated as the minimization of a continuous-time form of the temporal difference (TD) error. Update methods based on backward Euler approximation and exponential eligibility traces are derived, and their correspondences with the conventional residual gradient, TD(0), and TD(lambda) algorithms are shown. For policy improvement, two methods-a continuous actor-critic method and a value-gradient-based greedy policy-are formulated. As a special case of the latter, a nonlinear feedback control law using the value gradient and the model of the input gain is derived. The advantage updating, a model-free algorithm derived previously, is also formulated in the HJB-based framework. The performance of the proposed algorithms is first tested in a nonlinear control task of swinging a pendulum up with limited torque. It is shown in the simulations that (1) the task is accomplished by the continuous actor-critic method in a number of trials several times fewer than by the conventional discrete actor-critic method; (2) among the continuous policy update methods, the value-gradient-based policy with a known or learned dynamic model performs several times better than the actor-critic method; and (3) a value function update using exponential eligibility traces is more efficient and stable than that based on Euler approximation. The algorithms are then tested in a higher-dimensional task: cart-pole swing-up. This task is accomplished in several hundred trials using the value-gradient-based policy with a learned dynamic model.
journal_name
Neural Computjournal_title
Neural computationauthors
Doya Kdoi
10.1162/089976600300015961subject
Has Abstractpub_date
2000-01-01 00:00:00pages
219-45issue
1eissn
0899-7667issn
1530-888Xjournal_volume
12pub_type
杂志文章abstract::We present a new supervised learning procedure for ensemble machines, in which outputs of predictors, trained on different distributions, are combined by a dynamic classifier combination model. This procedure may be viewed as either a version of mixture of experts (Jacobs, Jordan, Nowlan, & Hintnon, 1991), applied to ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976699300016737
更新日期:1999-02-15 00:00:00
abstract::This letter deals with neural networks as dynamical systems governed by finite difference equations. It shows that the introduction of
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco_a_01165
更新日期:2019-03-01 00:00:00
abstract::The relationship between a neuron's complex inputs and its spiking output defines the neuron's coding strategy. This is frequently and effectively modeled phenomenologically by one or more linear filters that extract the components of the stimulus that are relevant for triggering spikes and a nonlinear function that r...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2009.02-09-956
更新日期:2010-03-01 00:00:00
abstract::In this letter, we investigate the fundamental limits on how the interspike time of a neuron oscillator can be perturbed by the application of a bounded external control input (a current stimulus) with zero net electric charge accumulation. We use phase models to study the dynamics of neurons and derive charge-balance...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00643
更新日期:2014-10-01 00:00:00
abstract::The new time-organized map (TOM) is presented for a better understanding of the self-organization and geometric structure of cortical signal representations. The algorithm extends the common self-organizing map (SOM) from the processing of purely spatial signals to the processing of spatiotemporal signals. The main ad...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976603765202695
更新日期:2003-05-01 00:00:00
abstract::Recurrent neural architectures having oscillatory dynamics use rhythmic network activity to represent patterns stored in short-term memory. Multiple stored patterns can be retained in memory over the same neural substrate because the network's state persistently switches between them. Here we present a simple oscillat...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2008.02-08-715
更新日期:2009-03-01 00:00:00
abstract::This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976602760407973
更新日期:2002-11-01 00:00:00
abstract::We develop a group-theoretical analysis of slow feature analysis for the case where the input data are generated by applying a set of continuous transformations to static templates. As an application of the theory, we analytically derive nonlinear visual receptive fields and show that their optimal stimuli, as well as...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00072
更新日期:2011-02-01 00:00:00
abstract::The past decade has seen a rise of interest in Laplacian eigenmaps (LEMs) for nonlinear dimensionality reduction. LEMs have been used in spectral clustering, in semisupervised learning, and for providing efficient state representations for reinforcement learning. Here, we show that LEMs are closely related to slow fea...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00214
更新日期:2011-12-01 00:00:00
abstract::Recently we presented a stochastic, ensemble-based model of spike-timing-dependent plasticity. In this model, single synapses do not exhibit plasticity depending on the exact timing of pre- and postsynaptic spikes, but spike-timing-dependent plasticity emerges only at the temporal or synaptic ensemble level. We showed...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2007.19.5.1362
更新日期:2007-05-01 00:00:00
abstract::We present a reduction of a Hodgkin-Huxley (HH)--style bursting model to a hybridized integrate-and-fire (IF) formalism based on a thorough bifurcation analysis of the neuron's dynamics. The model incorporates HH--style equations to evolve the subthreshold currents and includes IF mechanisms to characterize spike even...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976603322518768
更新日期:2003-12-01 00:00:00
abstract::The expected free energy (EFE) is a central quantity in the theory of active inference. It is the quantity that all active inference agents are mandated to minimize through action, and its decomposition into extrinsic and intrinsic value terms is key to the balance of exploration and exploitation that active inference...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco_a_01354
更新日期:2021-01-05 00:00:00
abstract::The hypothesis of invariant maximization of interaction (IMI) is formulated within the setting of random fields. According to this hypothesis, learning processes maximize the stochastic interaction of the neurons subject to constraints. We consider the extrinsic constraint in terms of a fixed input distribution on the...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976602760805368
更新日期:2002-12-01 00:00:00
abstract::We show that Langevin Markov chain Monte Carlo inference in an energy-based model with latent variables has the property that the early steps of inference, starting from a stationary point, correspond to propagating error gradients into internal layers, similar to backpropagation. The backpropagated error is with resp...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00934
更新日期:2017-03-01 00:00:00
abstract::Based on the dopamine hypotheses of cocaine addiction and the assumption of decrement of brain reward system sensitivity after long-term drug exposure, we propose a computational model for cocaine addiction. Utilizing average reward temporal difference reinforcement learning, we incorporate the elevation of basal rewa...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2009.10-08-882
更新日期:2009-10-01 00:00:00
abstract::This article presents a new theoretical framework to consider the dynamics of a stochastic spiking neuron model with general membrane response to input spike. We assume that the input spikes obey an inhomogeneous Poisson process. The stochastic process of the membrane potential then becomes a gaussian process. When a ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976601317098529
更新日期:2001-12-01 00:00:00
abstract::The pyloric network of the stomatogastric ganglion in crustacea is a central pattern generator that can produce the same basic rhythm over a wide frequency range. Three electrically coupled neurons, the anterior burster (AB) neuron and two pyloric dilator (PD) neurons, act as a pacemaker unit for the pyloric network. ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.1991.3.4.487
更新日期:1991-01-01 00:00:00
abstract::For any memoryless communication channel with a binary-valued input and a one-dimensional real-valued output, we introduce a probabilistic lower bound on the mutual information given empirical observations on the channel. The bound is built on the Dvoretzky-Kiefer-Wolfowitz inequality and is distribution free. A quadr...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00144
更新日期:2011-07-01 00:00:00
abstract::A minimal model is presented to explain changes in frequency, shape, and amplitude of Ca2+ oscillations in the neuroendocrine melanotrope cell of Xenopus Laevis. It describes the cell as a plasma membrane oscillator with influx of extracellular Ca2+ via voltage-gated Ca2+ channels in the plasma membrane. The Ca2+ osci...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976601300014655
更新日期:2001-01-01 00:00:00
abstract::Field models provide an elegant mathematical framework to analyze large-scale patterns of neural activity. On the microscopic level, these models are usually based on either a firing-rate picture or integrate-and-fire dynamics. This article shows that in spite of the large conceptual differences between the two types ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/08997660260028656
更新日期:2002-07-01 00:00:00
abstract::Recent experimental and computational evidence suggests that several dynamical properties may characterize the operating point of functioning neural networks: critical branching, neutral stability, and production of a wide range of firing patterns. We seek the simplest setting in which these properties emerge, clarify...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00461
更新日期:2013-07-01 00:00:00
abstract::We explicitly analyze the trajectories of learning near singularities in hierarchical networks, such as multilayer perceptrons and radial basis function networks, which include permutation symmetry of hidden nodes, and show their general properties. Such symmetry induces singularities in their parameter space, where t...
journal_title:Neural computation
pub_type: 信件
doi:10.1162/neco.2007.12-06-414
更新日期:2008-03-01 00:00:00
abstract::Spiking neural P systems (SN P systems) are a class of distributed parallel computing devices inspired by spiking neurons, where the spiking rules are usually used in a sequential way (an applicable rule is applied one time at a step) or an exhaustive way (an applicable rule is applied as many times as possible at a s...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00665
更新日期:2014-12-01 00:00:00
abstract::Decision trees and neural networks are widely used tools for pattern classification. Decision trees provide highly localized representation, whereas neural networks provide a distributed but compact representation of the decision space. Decision trees cannot be induced in the online mode, and they are not adaptive to ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/0899766041336396
更新日期:2004-09-01 00:00:00
abstract::To understand the interspike interval (ISI) variability displayed by visual cortical neurons (Softky & Koch, 1993), it is critical to examine the dynamics of their neuronal integration, as well as the variability in their synaptic input current. Most previous models have focused on the latter factor. We match a simple...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.1997.9.5.971
更新日期:1997-07-01 00:00:00
abstract::Due to many experimental reports of synchronous neural activity in the brain, there is much interest in understanding synchronization in networks of neural oscillators and its potential for computing perceptual organization. Contrary to Hopfield and Herz (1995), we find that networks of locally coupled integrate-and-f...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976699300016160
更新日期:1999-10-01 00:00:00
abstract::Attractor networks are widely believed to underlie the memory systems of animals across different species. Existing models have succeeded in qualitatively modeling properties of attractor dynamics, but their computational abilities often suffer from poor representations for realistic complex patterns, spurious attract...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2010.02-09-957
更新日期:2010-05-01 00:00:00
abstract::We present a model of visual computation based on tightly inter-connected cliques of pyramidal cells. It leads to a formal theory of cell assemblies, a specific relationship between correlated firing patterns and abstract functionality, and a direct calculation relating estimates of cortical cell counts to orientation...
journal_title:Neural computation
pub_type: 杂志文章,评审
doi:10.1162/089976699300016782
更新日期:1999-01-01 00:00:00
abstract::Energy-efficient information transmission may be relevant to biological sensory signal processing as well as to low-power electronic devices. We explore its consequences in two different regimes. In an "immediate" regime, we argue that the information rate should be maximized subject to a power constraint, and in an "...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976601300014358
更新日期:2001-04-01 00:00:00
abstract::We present an integrative formalism of mutual information expansion, the general Poisson exact breakdown, which explicitly evaluates the informational contribution of correlations in the spike counts both between and within neurons. The formalism was validated on simulated data and applied to real neurons recorded fro...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2010.04-09-989
更新日期:2010-06-01 00:00:00