Reinforcement learning in continuous time and space.

Abstract:

:This article presents a reinforcement learning framework for continuous-time dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman (HJB) equation for infinite-horizon, discounted reward problems, we derive algorithms for estimating value functions and improving policies with the use of function approximators. The process of value function estimation is formulated as the minimization of a continuous-time form of the temporal difference (TD) error. Update methods based on backward Euler approximation and exponential eligibility traces are derived, and their correspondences with the conventional residual gradient, TD(0), and TD(lambda) algorithms are shown. For policy improvement, two methods-a continuous actor-critic method and a value-gradient-based greedy policy-are formulated. As a special case of the latter, a nonlinear feedback control law using the value gradient and the model of the input gain is derived. The advantage updating, a model-free algorithm derived previously, is also formulated in the HJB-based framework. The performance of the proposed algorithms is first tested in a nonlinear control task of swinging a pendulum up with limited torque. It is shown in the simulations that (1) the task is accomplished by the continuous actor-critic method in a number of trials several times fewer than by the conventional discrete actor-critic method; (2) among the continuous policy update methods, the value-gradient-based policy with a known or learned dynamic model performs several times better than the actor-critic method; and (3) a value function update using exponential eligibility traces is more efficient and stable than that based on Euler approximation. The algorithms are then tested in a higher-dimensional task: cart-pole swing-up. This task is accomplished in several hundred trials using the value-gradient-based policy with a learned dynamic model.

journal_name

Neural Comput

journal_title

Neural computation

authors

Doya K

doi

10.1162/089976600300015961

subject

Has Abstract

pub_date

2000-01-01 00:00:00

pages

219-45

issue

1

eissn

0899-7667

issn

1530-888X

journal_volume

12

pub_type

杂志文章
  • Boosted mixture of experts: an ensemble learning scheme.

    abstract::We present a new supervised learning procedure for ensemble machines, in which outputs of predictors, trained on different distributions, are combined by a dynamic classifier combination model. This procedure may be viewed as either a version of mixture of experts (Jacobs, Jordan, Nowlan, & Hintnon, 1991), applied to ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976699300016737

    authors: Avnimelech R,Intrator N

    更新日期:1999-02-15 00:00:00

  • State-Space Representations of Deep Neural Networks.

    abstract::This letter deals with neural networks as dynamical systems governed by finite difference equations. It shows that the introduction of k -many skip connections into network architectures, such as residual networks and additive dense n...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01165

    authors: Hauser M,Gunn S,Saab S Jr,Ray A

    更新日期:2019-03-01 00:00:00

  • Feature selection in simple neurons: how coding depends on spiking dynamics.

    abstract::The relationship between a neuron's complex inputs and its spiking output defines the neuron's coding strategy. This is frequently and effectively modeled phenomenologically by one or more linear filters that extract the components of the stimulus that are relevant for triggering spikes and a nonlinear function that r...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.02-09-956

    authors: Famulare M,Fairhall A

    更新日期:2010-03-01 00:00:00

  • Design of charge-balanced time-optimal stimuli for spiking neuron oscillators.

    abstract::In this letter, we investigate the fundamental limits on how the interspike time of a neuron oscillator can be perturbed by the application of a bounded external control input (a current stimulus) with zero net electric charge accumulation. We use phase models to study the dynamics of neurons and derive charge-balance...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00643

    authors: Dasanayake IS,Li JS

    更新日期:2014-10-01 00:00:00

  • The time-organized map algorithm: extending the self-organizing map to spatiotemporal signals.

    abstract::The new time-organized map (TOM) is presented for a better understanding of the self-organization and geometric structure of cortical signal representations. The algorithm extends the common self-organizing map (SOM) from the processing of purely spatial signals to the processing of spatiotemporal signals. The main ad...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976603765202695

    authors: Wiemer JC

    更新日期:2003-05-01 00:00:00

  • An oscillatory Hebbian network model of short-term memory.

    abstract::Recurrent neural architectures having oscillatory dynamics use rhythmic network activity to represent patterns stored in short-term memory. Multiple stored patterns can be retained in memory over the same neural substrate because the network's state persistently switches between them. Here we present a simple oscillat...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2008.02-08-715

    authors: Winder RK,Reggia JA,Weems SA,Bunting MF

    更新日期:2009-03-01 00:00:00

  • Long-term reward prediction in TD models of the dopamine system.

    abstract::This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602760407973

    authors: Daw ND,Touretzky DS

    更新日期:2002-11-01 00:00:00

  • A theory of slow feature analysis for transformation-based input signals with an application to complex cells.

    abstract::We develop a group-theoretical analysis of slow feature analysis for the case where the input data are generated by applying a set of continuous transformations to static templates. As an application of the theory, we analytically derive nonlinear visual receptive fields and show that their optimal stimuli, as well as...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00072

    authors: Sprekeler H,Wiskott L

    更新日期:2011-02-01 00:00:00

  • On the relation of slow feature analysis and Laplacian eigenmaps.

    abstract::The past decade has seen a rise of interest in Laplacian eigenmaps (LEMs) for nonlinear dimensionality reduction. LEMs have been used in spectral clustering, in semisupervised learning, and for providing efficient state representations for reinforcement learning. Here, we show that LEMs are closely related to slow fea...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00214

    authors: Sprekeler H

    更新日期:2011-12-01 00:00:00

  • Multispike interactions in a stochastic model of spike-timing-dependent plasticity.

    abstract::Recently we presented a stochastic, ensemble-based model of spike-timing-dependent plasticity. In this model, single synapses do not exhibit plasticity depending on the exact timing of pre- and postsynaptic spikes, but spike-timing-dependent plasticity emerges only at the temporal or synaptic ensemble level. We showed...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2007.19.5.1362

    authors: Appleby PA,Elliott T

    更新日期:2007-05-01 00:00:00

  • Hybrid integrate-and-fire model of a bursting neuron.

    abstract::We present a reduction of a Hodgkin-Huxley (HH)--style bursting model to a hybridized integrate-and-fire (IF) formalism based on a thorough bifurcation analysis of the neuron's dynamics. The model incorporates HH--style equations to evolve the subthreshold currents and includes IF mechanisms to characterize spike even...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976603322518768

    authors: Breen BJ,Gerken WC,Butera RJ Jr

    更新日期:2003-12-01 00:00:00

  • Whence the Expected Free Energy?

    abstract::The expected free energy (EFE) is a central quantity in the theory of active inference. It is the quantity that all active inference agents are mandated to minimize through action, and its decomposition into extrinsic and intrinsic value terms is key to the balance of exploration and exploitation that active inference...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01354

    authors: Millidge B,Tschantz A,Buckley CL

    更新日期:2021-01-05 00:00:00

  • Locality of global stochastic interaction in directed acyclic networks.

    abstract::The hypothesis of invariant maximization of interaction (IMI) is formulated within the setting of random fields. According to this hypothesis, learning processes maximize the stochastic interaction of the neurons subject to constraints. We consider the extrinsic constraint in terms of a fixed input distribution on the...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602760805368

    authors: Ay N

    更新日期:2002-12-01 00:00:00

  • STDP-Compatible Approximation of Backpropagation in an Energy-Based Model.

    abstract::We show that Langevin Markov chain Monte Carlo inference in an energy-based model with latent variables has the property that the early steps of inference, starting from a stationary point, correspond to propagating error gradients into internal layers, similar to backpropagation. The backpropagated error is with resp...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00934

    authors: Bengio Y,Mesnard T,Fischer A,Zhang S,Wu Y

    更新日期:2017-03-01 00:00:00

  • A neurocomputational model for cocaine addiction.

    abstract::Based on the dopamine hypotheses of cocaine addiction and the assumption of decrement of brain reward system sensitivity after long-term drug exposure, we propose a computational model for cocaine addiction. Utilizing average reward temporal difference reinforcement learning, we incorporate the elevation of basal rewa...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.10-08-882

    authors: Dezfouli A,Piray P,Keramati MM,Ekhtiari H,Lucas C,Mokri A

    更新日期:2009-10-01 00:00:00

  • Gaussian process approach to spiking neurons for inhomogeneous Poisson inputs.

    abstract::This article presents a new theoretical framework to consider the dynamics of a stochastic spiking neuron model with general membrane response to input spike. We assume that the input spikes obey an inhomogeneous Poisson process. The stochastic process of the membrane potential then becomes a gaussian process. When a ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976601317098529

    authors: Amemori KI,Ishii S

    更新日期:2001-12-01 00:00:00

  • Oscillating Networks: Control of Burst Duration by Electrically Coupled Neurons.

    abstract::The pyloric network of the stomatogastric ganglion in crustacea is a central pattern generator that can produce the same basic rhythm over a wide frequency range. Three electrically coupled neurons, the anterior burster (AB) neuron and two pyloric dilator (PD) neurons, act as a pacemaker unit for the pyloric network. ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1991.3.4.487

    authors: Abbott LF,Marder E,Hooper SL

    更新日期:1991-01-01 00:00:00

  • A finite-sample, distribution-free, probabilistic lower bound on mutual information.

    abstract::For any memoryless communication channel with a binary-valued input and a one-dimensional real-valued output, we introduce a probabilistic lower bound on the mutual information given empirical observations on the channel. The bound is built on the Dvoretzky-Kiefer-Wolfowitz inequality and is distribution free. A quadr...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00144

    authors: VanderKraats ND,Banerjee A

    更新日期:2011-07-01 00:00:00

  • Minimal model for intracellular calcium oscillations and electrical bursting in melanotrope cells of Xenopus laevis.

    abstract::A minimal model is presented to explain changes in frequency, shape, and amplitude of Ca2+ oscillations in the neuroendocrine melanotrope cell of Xenopus Laevis. It describes the cell as a plasma membrane oscillator with influx of extracellular Ca2+ via voltage-gated Ca2+ channels in the plasma membrane. The Ca2+ osci...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976601300014655

    authors: Cornelisse LN,Scheenen WJ,Koopman WJ,Roubos EW,Gielen SC

    更新日期:2001-01-01 00:00:00

  • Traveling waves of excitation in neural field models: equivalence of rate descriptions and integrate-and-fire dynamics.

    abstract::Field models provide an elegant mathematical framework to analyze large-scale patterns of neural activity. On the microscopic level, these models are usually based on either a firing-rate picture or integrate-and-fire dynamics. This article shows that in spite of the large conceptual differences between the two types ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/08997660260028656

    authors: Cremers D,Herz AV

    更新日期:2002-07-01 00:00:00

  • Neutral stability, rate propagation, and critical branching in feedforward networks.

    abstract::Recent experimental and computational evidence suggests that several dynamical properties may characterize the operating point of functioning neural networks: critical branching, neutral stability, and production of a wide range of firing patterns. We seek the simplest setting in which these properties emerge, clarify...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00461

    authors: Cayco-Gajic NA,Shea-Brown E

    更新日期:2013-07-01 00:00:00

  • Dynamics of learning near singularities in layered networks.

    abstract::We explicitly analyze the trajectories of learning near singularities in hierarchical networks, such as multilayer perceptrons and radial basis function networks, which include permutation symmetry of hidden nodes, and show their general properties. Such symmetry induces singularities in their parameter space, where t...

    journal_title:Neural computation

    pub_type: 信件

    doi:10.1162/neco.2007.12-06-414

    authors: Wei H,Zhang J,Cousseau F,Ozeki T,Amari S

    更新日期:2008-03-01 00:00:00

  • Spiking neural P systems with a generalized use of rules.

    abstract::Spiking neural P systems (SN P systems) are a class of distributed parallel computing devices inspired by spiking neurons, where the spiking rules are usually used in a sequential way (an applicable rule is applied one time at a step) or an exhaustive way (an applicable rule is applied as many times as possible at a s...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00665

    authors: Zhang X,Wang B,Pan L

    更新日期:2014-12-01 00:00:00

  • Online adaptive decision trees.

    abstract::Decision trees and neural networks are widely used tools for pattern classification. Decision trees provide highly localized representation, whereas neural networks provide a distributed but compact representation of the decision space. Decision trees cannot be induced in the online mode, and they are not adaptive to ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766041336396

    authors: Basak J

    更新日期:2004-09-01 00:00:00

  • Physiological gain leads to high ISI variability in a simple model of a cortical regular spiking cell.

    abstract::To understand the interspike interval (ISI) variability displayed by visual cortical neurons (Softky & Koch, 1993), it is critical to examine the dynamics of their neuronal integration, as well as the variability in their synaptic input current. Most previous models have focused on the latter factor. We match a simple...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1997.9.5.971

    authors: Troyer TW,Miller KD

    更新日期:1997-07-01 00:00:00

  • Synchrony and desynchrony in integrate-and-fire oscillators.

    abstract::Due to many experimental reports of synchronous neural activity in the brain, there is much interest in understanding synchronization in networks of neural oscillators and its potential for computing perceptual organization. Contrary to Hopfield and Herz (1995), we find that networks of locally coupled integrate-and-f...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976699300016160

    authors: Campbell SR,Wang DL,Jayaprakash C

    更新日期:1999-10-01 00:00:00

  • A Gaussian attractor network for memory and recognition with experience-dependent learning.

    abstract::Attractor networks are widely believed to underlie the memory systems of animals across different species. Existing models have succeeded in qualitatively modeling properties of attractor dynamics, but their computational abilities often suffer from poor representations for realistic complex patterns, spurious attract...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2010.02-09-957

    authors: Hu X,Zhang B

    更新日期:2010-05-01 00:00:00

  • Computing with self-excitatory cliques: A model and an application to hyperacuity-scale computation in visual cortex.

    abstract::We present a model of visual computation based on tightly inter-connected cliques of pyramidal cells. It leads to a formal theory of cell assemblies, a specific relationship between correlated firing patterns and abstract functionality, and a direct calculation relating estimates of cortical cell counts to orientation...

    journal_title:Neural computation

    pub_type: 杂志文章,评审

    doi:10.1162/089976699300016782

    authors: Miller DA,Zucker SW

    更新日期:1999-01-01 00:00:00

  • Metabolically efficient information processing.

    abstract::Energy-efficient information transmission may be relevant to biological sensory signal processing as well as to low-power electronic devices. We explore its consequences in two different regimes. In an "immediate" regime, we argue that the information rate should be maximized subject to a power constraint, and in an "...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976601300014358

    authors: Balasubramanian V,Kimber D,Berry MJ 2nd

    更新日期:2001-04-01 00:00:00

  • General Poisson exact breakdown of the mutual information to study the role of correlations in populations of neurons.

    abstract::We present an integrative formalism of mutual information expansion, the general Poisson exact breakdown, which explicitly evaluates the informational contribution of correlations in the spike counts both between and within neurons. The formalism was validated on simulated data and applied to real neurons recorded fro...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2010.04-09-989

    authors: Scaglione A,Moxon KA,Foffani G

    更新日期:2010-06-01 00:00:00