Multiple model-based reinforcement learning.

Abstract:

:We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple model-based reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The system is composed of multiple modules, each of which consists of a state prediction model and a reinforcement learning controller. The "responsibility signal," which is given by the softmax function of the prediction errors, is used to weight the outputs of multiple modules, as well as to gate the learning of the prediction models and the reinforcement learning controllers. We formulate MMRL for both discrete-time, finite-state case and continuous-time, continuous-state case. The performance of MMRL was demonstrated for discrete case in a nonstationary hunting task in a grid world and for continuous case in a nonlinear, nonstationary control task of swinging up a pendulum with variable physical parameters.

journal_name

Neural Comput

journal_title

Neural computation

authors

Doya K,Samejima K,Katagiri K,Kawato M

doi

10.1162/089976602753712972

subject

Has Abstract

pub_date

2002-06-01 00:00:00

pages

1347-69

issue

6

eissn

0899-7667

issn

1530-888X

journal_volume

14

pub_type

杂志文章
  • Making the error-controlling algorithm of observable operator models constructive.

    abstract::Observable operator models (OOMs) are a class of models for stochastic processes that properly subsumes the class that can be modeled by finite-dimensional hidden Markov models (HMMs). One of the main advantages of OOMs over HMMs is that they admit asymptotically correct learning algorithms. A series of learning algor...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.10-08-878

    authors: Zhao MJ,Jaeger H,Thon M

    更新日期:2009-12-01 00:00:00

  • The computational structure of spike trains.

    abstract::Neurons perform computations, and convey the results of those computations through the statistical structure of their output spike trains. Here we present a practical method, grounded in the information-theoretic analysis of prediction, for inferring a minimal representation of that structure and for characterizing it...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.12-07-678

    authors: Haslinger R,Klinkner KL,Shalizi CR

    更新日期:2010-01-01 00:00:00

  • Estimating spiking irregularities under changing environments.

    abstract::We considered a gamma distribution of interspike intervals as a statistical model for neuronal spike generation. A gamma distribution is a natural extension of the Poisson process taking the effect of a refractory period into account. The model is specified by two parameters: a time-dependent firing rate and a shape p...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2006.18.10.2359

    authors: Miura K,Okada M,Amari S

    更新日期:2006-10-01 00:00:00

  • Modeling sensorimotor learning with linear dynamical systems.

    abstract::Recent studies have employed simple linear dynamical systems to model trial-by-trial dynamics in various sensorimotor learning tasks. Here we explore the theoretical and practical considerations that arise when employing the general class of linear dynamical systems (LDS) as a model for sensorimotor learning. In this ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976606775774651

    authors: Cheng S,Sabes PN

    更新日期:2006-04-01 00:00:00

  • Abstract stimulus-specific adaptation models.

    abstract::Many neurons that initially respond to a stimulus stop responding if the stimulus is presented repeatedly but recover their response if a different stimulus is presented. This phenomenon is referred to as stimulus-specific adaptation (SSA). SSA has been investigated extensively using oddball experiments, which measure...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00077

    authors: Mill R,Coath M,Wennekers T,Denham SL

    更新日期:2011-02-01 00:00:00

  • Learning Precise Spike Train-to-Spike Train Transformations in Multilayer Feedforward Neuronal Networks.

    abstract::We derive a synaptic weight update rule for learning temporally precise spike train-to-spike train transformations in multilayer feedforward networks of spiking neurons. The framework, aimed at seamlessly generalizing error backpropagation to the deterministic spiking neuron setting, is based strictly on spike timing ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00829

    authors: Banerjee A

    更新日期:2016-05-01 00:00:00

  • Changes in GABAB modulation during a theta cycle may be analogous to the fall of temperature during annealing.

    abstract::Changes in GABA modulation may underlie experimentally observed changes in the strength of synaptic transmission at different phases of the theta rhythm (Wyble, Linster, & Hasselmo, 1997). Analysis demonstrates that these changes improve sequence disambiguation by a neural network model of CA3. We show that in the fra...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976698300017539

    authors: Sohal VS,Hasselmo ME

    更新日期:1998-05-15 00:00:00

  • A semiparametric Bayesian model for detecting synchrony among multiple neurons.

    abstract::We propose a scalable semiparametric Bayesian model to capture dependencies among multiple neurons by detecting their cofiring (possibly with some lag time) patterns over time. After discretizing time so there is at most one spike at each interval, the resulting sequence of 1s (spike) and 0s (silence) for each neuron ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00631

    authors: Shahbaba B,Zhou B,Lan S,Ombao H,Moorman D,Behseta S

    更新日期:2014-09-01 00:00:00

  • Feature selection in simple neurons: how coding depends on spiking dynamics.

    abstract::The relationship between a neuron's complex inputs and its spiking output defines the neuron's coding strategy. This is frequently and effectively modeled phenomenologically by one or more linear filters that extract the components of the stimulus that are relevant for triggering spikes and a nonlinear function that r...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.02-09-956

    authors: Famulare M,Fairhall A

    更新日期:2010-03-01 00:00:00

  • Cortical spatiotemporal dimensionality reduction for visual grouping.

    abstract::The visual systems of many mammals, including humans, are able to integrate the geometric information of visual stimuli and perform cognitive tasks at the first stages of the cortical processing. This is thought to be the result of a combination of mechanisms, which include feature extraction at the single cell level ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00738

    authors: Cocci G,Barbieri D,Citti G,Sarti A

    更新日期:2015-06-01 00:00:00

  • Downstream Effect of Ramping Neuronal Activity through Synapses with Short-Term Plasticity.

    abstract::Ramping neuronal activity refers to spiking activity with a rate that increases quasi-linearly over time. It has been observed in multiple cortical areas and is correlated with evidence accumulation processes or timing. In this work, we investigated the downstream effect of ramping neuronal activity through synapses t...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00818

    authors: Wei W,Wang XJ

    更新日期:2016-04-01 00:00:00

  • Nonlinear Time&hyphenSeries Prediction with Missing and Noisy Data

    abstract::We derive solutions for the problem of missing and noisy data in nonlinear time&hyphenseries prediction from a probabilistic point of view. We discuss different approximations to the solutions &hyphen in particular, approximations that require either stochastic simulation or the substitution of a single estimate for t...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976698300017728

    authors: Tresp V V,Hofmann R

    更新日期:1998-03-23 00:00:00

  • Convergence of the IRWLS Procedure to the Support Vector Machine Solution.

    abstract::An iterative reweighted least squares (IRWLS) procedure recently proposed is shown to converge to the support vector machine solution. The convergence to a stationary point is ensured by modifying the original IRWLS procedure. ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766052530875

    authors: Pérez-Cruz F,Bousoño-Calzón C,Artés-Rodríguez A

    更新日期:2005-01-01 00:00:00

  • Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms.

    abstract::In this review, we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spike-timing-dependent plasticity (STDP). This review introduces the most influential models and focuses on two questions: To what degree are reward...

    journal_title:Neural computation

    pub_type: 杂志文章,评审

    doi:10.1162/0899766053011555

    authors: Wörgötter F,Porr B

    更新日期:2005-02-01 00:00:00

  • Reinforcement learning in continuous time and space.

    abstract::This article presents a reinforcement learning framework for continuous-time dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman (HJB) equation for infinite-horizon, discounted reward problems, we derive algorithms for estimating value functions and improv...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976600300015961

    authors: Doya K

    更新日期:2000-01-01 00:00:00

  • Spikernels: predicting arm movements by embedding population spike rate patterns in inner-product spaces.

    abstract::Inner-product operators, often referred to as kernels in statistical learning, define a mapping from some input space into a feature space. The focus of this letter is the construction of biologically motivated kernels for cortical activities. The kernels we derive, termed Spikernels, map spike count sequences into an...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/0899766053019944

    authors: Shpigelman L,Singer Y,Paz R,Vaadia E

    更新日期:2005-03-01 00:00:00

  • Parsing Complex Sentences with Structured Connectionist Networks.

    abstract::A modular, recurrent connectionist network is taught to incrementally parse complex sentences. From input presented one word at a time, the network learns to do semantic role assignment, noun phrase attachment, and clause structure recognition, for sentences with both active and passive constructions and center-embedd...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.1991.3.1.110

    authors: Jain AN

    更新日期:1991-04-01 00:00:00

  • Local and global gating of synaptic plasticity.

    abstract::Mechanisms influencing learning in neural networks are usually investigated on either a local or a global scale. The former relates to synaptic processes, the latter to unspecific modulatory systems. Here we study the interaction of a local learning rule that evaluates coincidences of pre- and postsynaptic action pote...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976600300015682

    authors: Sánchez-Montañés MA,Verschure PF,König P

    更新日期:2000-03-01 00:00:00

  • Scalable Semisupervised Functional Neurocartography Reveals Canonical Neurons in Behavioral Networks.

    abstract::Large-scale data collection efforts to map the brain are underway at multiple spatial and temporal scales, but all face fundamental problems posed by high-dimensional data and intersubject variability. Even seemingly simple problems, such as identifying a neuron/brain region across animals/subjects, become exponential...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00852

    authors: Frady EP,Kapoor A,Horvitz E,Kristan WB Jr

    更新日期:2016-08-01 00:00:00

  • An integral upper bound for neural network approximation.

    abstract::Complexity of one-hidden-layer networks is studied using tools from nonlinear approximation and integration theory. For functions with suitable integral representations in the form of networks with infinitely many hidden units, upper bounds are derived on the speed of decrease of approximation error as the number of n...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.04-08-745

    authors: Kainen PC,Kůrková V

    更新日期:2009-10-01 00:00:00

  • Variations on the Theme of Synaptic Filtering: A Comparison of Integrate-and-Express Models of Synaptic Plasticity for Memory Lifetimes.

    abstract::Integrate-and-express models of synaptic plasticity propose that synapses integrate plasticity induction signals before expressing synaptic plasticity. By discerning trends in their induction signals, synapses can control destabilizing fluctuations in synaptic strength. In a feedforward perceptron framework with binar...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00889

    authors: Elliott T

    更新日期:2016-11-01 00:00:00

  • Why Does Large Batch Training Result in Poor Generalization? A Comprehensive Explanation and a Better Strategy from the Viewpoint of Stochastic Optimization.

    abstract::We present a comprehensive framework of search methods, such as simulated annealing and batch training, for solving nonconvex optimization problems. These methods search a wider range by gradually decreasing the randomness added to the standard gradient descent method. The formulation that we define on the basis of th...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01089

    authors: Takase T,Oyama S,Kurihara M

    更新日期:2018-07-01 00:00:00

  • On the performance of voltage stepping for the simulation of adaptive, nonlinear integrate-and-fire neuronal networks.

    abstract::In traditional event-driven strategies, spike timings are analytically given or calculated with arbitrary precision (up to machine precision). Exact computation is possible only for simplified neuron models, mainly the leaky integrate-and-fire model. In a recent paper, Zheng, Tonnelier, and Martinez (2009) introduced ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/NECO_a_00112

    authors: Kaabi MG,Tonnelier A,Martinez D

    更新日期:2011-05-01 00:00:00

  • Direct estimation of inhomogeneous Markov interval models of spike trains.

    abstract::A necessary ingredient for a quantitative theory of neural coding is appropriate "spike kinematics": a precise description of spike trains. While summarizing experiments by complete spike time collections is clearly inefficient and probably unnecessary, the most common probabilistic model used in neurophysiology, the ...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2009.07-08-828

    authors: Wójcik DK,Mochol G,Jakuczun W,Wypych M,Waleszczyk WJ

    更新日期:2009-08-01 00:00:00

  • Synchrony and desynchrony in integrate-and-fire oscillators.

    abstract::Due to many experimental reports of synchronous neural activity in the brain, there is much interest in understanding synchronization in networks of neural oscillators and its potential for computing perceptual organization. Contrary to Hopfield and Herz (1995), we find that networks of locally coupled integrate-and-f...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976699300016160

    authors: Campbell SR,Wang DL,Jayaprakash C

    更新日期:1999-10-01 00:00:00

  • Long-term reward prediction in TD models of the dopamine system.

    abstract::This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976602760407973

    authors: Daw ND,Touretzky DS

    更新日期:2002-11-01 00:00:00

  • Improving generalization performance of natural gradient learning using optimized regularization by NIC.

    abstract::Natural gradient learning is known to be efficient in escaping plateau, which is a main cause of the slow learning speed of neural networks. The adaptive natural gradient learning method for practical implementation also has been developed, and its advantage in real-world problems has been confirmed. In this letter, w...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/089976604322742065

    authors: Park H,Murata N,Amari S

    更新日期:2004-02-01 00:00:00

  • A Mean-Field Description of Bursting Dynamics in Spiking Neural Networks with Short-Term Adaptation.

    abstract::Bursting plays an important role in neural communication. At the population level, macroscopic bursting has been identified in populations of neurons that do not express intrinsic bursting mechanisms. For the analysis of phase transitions between bursting and non-bursting states, mean-field descriptions of macroscopic...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco_a_01300

    authors: Gast R,Schmidt H,Knösche TR

    更新日期:2020-09-01 00:00:00

  • Visual Categorization with Random Projection.

    abstract::Humans learn categories of complex objects quickly and from a few examples. Random projection has been suggested as a means to learn and categorize efficiently. We investigate how random projection affects categorization by humans and by very simple neural networks on the same stimuli and categorization tasks, and how...

    journal_title:Neural computation

    pub_type: 信件

    doi:10.1162/NECO_a_00769

    authors: Arriaga RI,Rutter D,Cakmak M,Vempala SS

    更新日期:2015-10-01 00:00:00

  • Selectivity and stability via dendritic nonlinearity.

    abstract::Inspired by recent studies regarding dendritic computation, we constructed a recurrent neural network model incorporating dendritic lateral inhibition. Our model consists of an input layer and a neuron layer that includes excitatory cells and an inhibitory cell; this inhibitory cell is activated by the pooled activiti...

    journal_title:Neural computation

    pub_type: 杂志文章

    doi:10.1162/neco.2007.19.7.1798

    authors: Morita K,Okada M,Aihara K

    更新日期:2007-07-01 00:00:00