Abstract:
:We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple model-based reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The system is composed of multiple modules, each of which consists of a state prediction model and a reinforcement learning controller. The "responsibility signal," which is given by the softmax function of the prediction errors, is used to weight the outputs of multiple modules, as well as to gate the learning of the prediction models and the reinforcement learning controllers. We formulate MMRL for both discrete-time, finite-state case and continuous-time, continuous-state case. The performance of MMRL was demonstrated for discrete case in a nonstationary hunting task in a grid world and for continuous case in a nonlinear, nonstationary control task of swinging up a pendulum with variable physical parameters.
journal_name
Neural Computjournal_title
Neural computationauthors
Doya K,Samejima K,Katagiri K,Kawato Mdoi
10.1162/089976602753712972subject
Has Abstractpub_date
2002-06-01 00:00:00pages
1347-69issue
6eissn
0899-7667issn
1530-888Xjournal_volume
14pub_type
杂志文章abstract::Observable operator models (OOMs) are a class of models for stochastic processes that properly subsumes the class that can be modeled by finite-dimensional hidden Markov models (HMMs). One of the main advantages of OOMs over HMMs is that they admit asymptotically correct learning algorithms. A series of learning algor...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2009.10-08-878
更新日期:2009-12-01 00:00:00
abstract::Neurons perform computations, and convey the results of those computations through the statistical structure of their output spike trains. Here we present a practical method, grounded in the information-theoretic analysis of prediction, for inferring a minimal representation of that structure and for characterizing it...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2009.12-07-678
更新日期:2010-01-01 00:00:00
abstract::We considered a gamma distribution of interspike intervals as a statistical model for neuronal spike generation. A gamma distribution is a natural extension of the Poisson process taking the effect of a refractory period into account. The model is specified by two parameters: a time-dependent firing rate and a shape p...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2006.18.10.2359
更新日期:2006-10-01 00:00:00
abstract::Recent studies have employed simple linear dynamical systems to model trial-by-trial dynamics in various sensorimotor learning tasks. Here we explore the theoretical and practical considerations that arise when employing the general class of linear dynamical systems (LDS) as a model for sensorimotor learning. In this ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976606775774651
更新日期:2006-04-01 00:00:00
abstract::Many neurons that initially respond to a stimulus stop responding if the stimulus is presented repeatedly but recover their response if a different stimulus is presented. This phenomenon is referred to as stimulus-specific adaptation (SSA). SSA has been investigated extensively using oddball experiments, which measure...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00077
更新日期:2011-02-01 00:00:00
abstract::We derive a synaptic weight update rule for learning temporally precise spike train-to-spike train transformations in multilayer feedforward networks of spiking neurons. The framework, aimed at seamlessly generalizing error backpropagation to the deterministic spiking neuron setting, is based strictly on spike timing ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00829
更新日期:2016-05-01 00:00:00
abstract::Changes in GABA modulation may underlie experimentally observed changes in the strength of synaptic transmission at different phases of the theta rhythm (Wyble, Linster, & Hasselmo, 1997). Analysis demonstrates that these changes improve sequence disambiguation by a neural network model of CA3. We show that in the fra...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976698300017539
更新日期:1998-05-15 00:00:00
abstract::We propose a scalable semiparametric Bayesian model to capture dependencies among multiple neurons by detecting their cofiring (possibly with some lag time) patterns over time. After discretizing time so there is at most one spike at each interval, the resulting sequence of 1s (spike) and 0s (silence) for each neuron ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00631
更新日期:2014-09-01 00:00:00
abstract::The relationship between a neuron's complex inputs and its spiking output defines the neuron's coding strategy. This is frequently and effectively modeled phenomenologically by one or more linear filters that extract the components of the stimulus that are relevant for triggering spikes and a nonlinear function that r...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2009.02-09-956
更新日期:2010-03-01 00:00:00
abstract::The visual systems of many mammals, including humans, are able to integrate the geometric information of visual stimuli and perform cognitive tasks at the first stages of the cortical processing. This is thought to be the result of a combination of mechanisms, which include feature extraction at the single cell level ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00738
更新日期:2015-06-01 00:00:00
abstract::Ramping neuronal activity refers to spiking activity with a rate that increases quasi-linearly over time. It has been observed in multiple cortical areas and is correlated with evidence accumulation processes or timing. In this work, we investigated the downstream effect of ramping neuronal activity through synapses t...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00818
更新日期:2016-04-01 00:00:00
abstract::We derive solutions for the problem of missing and noisy data in nonlinear time&hyphenseries prediction from a probabilistic point of view. We discuss different approximations to the solutions &hyphen in particular, approximations that require either stochastic simulation or the substitution of a single estimate for t...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976698300017728
更新日期:1998-03-23 00:00:00
abstract::An iterative reweighted least squares (IRWLS) procedure recently proposed is shown to converge to the support vector machine solution. The convergence to a stationary point is ensured by modifying the original IRWLS procedure. ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/0899766052530875
更新日期:2005-01-01 00:00:00
abstract::In this review, we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spike-timing-dependent plasticity (STDP). This review introduces the most influential models and focuses on two questions: To what degree are reward...
journal_title:Neural computation
pub_type: 杂志文章,评审
doi:10.1162/0899766053011555
更新日期:2005-02-01 00:00:00
abstract::This article presents a reinforcement learning framework for continuous-time dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman (HJB) equation for infinite-horizon, discounted reward problems, we derive algorithms for estimating value functions and improv...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976600300015961
更新日期:2000-01-01 00:00:00
abstract::Inner-product operators, often referred to as kernels in statistical learning, define a mapping from some input space into a feature space. The focus of this letter is the construction of biologically motivated kernels for cortical activities. The kernels we derive, termed Spikernels, map spike count sequences into an...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/0899766053019944
更新日期:2005-03-01 00:00:00
abstract::A modular, recurrent connectionist network is taught to incrementally parse complex sentences. From input presented one word at a time, the network learns to do semantic role assignment, noun phrase attachment, and clause structure recognition, for sentences with both active and passive constructions and center-embedd...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.1991.3.1.110
更新日期:1991-04-01 00:00:00
abstract::Mechanisms influencing learning in neural networks are usually investigated on either a local or a global scale. The former relates to synaptic processes, the latter to unspecific modulatory systems. Here we study the interaction of a local learning rule that evaluates coincidences of pre- and postsynaptic action pote...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976600300015682
更新日期:2000-03-01 00:00:00
abstract::Large-scale data collection efforts to map the brain are underway at multiple spatial and temporal scales, but all face fundamental problems posed by high-dimensional data and intersubject variability. Even seemingly simple problems, such as identifying a neuron/brain region across animals/subjects, become exponential...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00852
更新日期:2016-08-01 00:00:00
abstract::Complexity of one-hidden-layer networks is studied using tools from nonlinear approximation and integration theory. For functions with suitable integral representations in the form of networks with infinitely many hidden units, upper bounds are derived on the speed of decrease of approximation error as the number of n...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2009.04-08-745
更新日期:2009-10-01 00:00:00
abstract::Integrate-and-express models of synaptic plasticity propose that synapses integrate plasticity induction signals before expressing synaptic plasticity. By discerning trends in their induction signals, synapses can control destabilizing fluctuations in synaptic strength. In a feedforward perceptron framework with binar...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00889
更新日期:2016-11-01 00:00:00
abstract::We present a comprehensive framework of search methods, such as simulated annealing and batch training, for solving nonconvex optimization problems. These methods search a wider range by gradually decreasing the randomness added to the standard gradient descent method. The formulation that we define on the basis of th...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco_a_01089
更新日期:2018-07-01 00:00:00
abstract::In traditional event-driven strategies, spike timings are analytically given or calculated with arbitrary precision (up to machine precision). Exact computation is possible only for simplified neuron models, mainly the leaky integrate-and-fire model. In a recent paper, Zheng, Tonnelier, and Martinez (2009) introduced ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/NECO_a_00112
更新日期:2011-05-01 00:00:00
abstract::A necessary ingredient for a quantitative theory of neural coding is appropriate "spike kinematics": a precise description of spike trains. While summarizing experiments by complete spike time collections is clearly inefficient and probably unnecessary, the most common probabilistic model used in neurophysiology, the ...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2009.07-08-828
更新日期:2009-08-01 00:00:00
abstract::Due to many experimental reports of synchronous neural activity in the brain, there is much interest in understanding synchronization in networks of neural oscillators and its potential for computing perceptual organization. Contrary to Hopfield and Herz (1995), we find that networks of locally coupled integrate-and-f...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976699300016160
更新日期:1999-10-01 00:00:00
abstract::This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976602760407973
更新日期:2002-11-01 00:00:00
abstract::Natural gradient learning is known to be efficient in escaping plateau, which is a main cause of the slow learning speed of neural networks. The adaptive natural gradient learning method for practical implementation also has been developed, and its advantage in real-world problems has been confirmed. In this letter, w...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/089976604322742065
更新日期:2004-02-01 00:00:00
abstract::Bursting plays an important role in neural communication. At the population level, macroscopic bursting has been identified in populations of neurons that do not express intrinsic bursting mechanisms. For the analysis of phase transitions between bursting and non-bursting states, mean-field descriptions of macroscopic...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco_a_01300
更新日期:2020-09-01 00:00:00
abstract::Humans learn categories of complex objects quickly and from a few examples. Random projection has been suggested as a means to learn and categorize efficiently. We investigate how random projection affects categorization by humans and by very simple neural networks on the same stimuli and categorization tasks, and how...
journal_title:Neural computation
pub_type: 信件
doi:10.1162/NECO_a_00769
更新日期:2015-10-01 00:00:00
abstract::Inspired by recent studies regarding dendritic computation, we constructed a recurrent neural network model incorporating dendritic lateral inhibition. Our model consists of an input layer and a neuron layer that includes excitatory cells and an inhibitory cell; this inhibitory cell is activated by the pooled activiti...
journal_title:Neural computation
pub_type: 杂志文章
doi:10.1162/neco.2007.19.7.1798
更新日期:2007-07-01 00:00:00