Human-level control through deep reinforcement learning.

Abstract:

:The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

journal_name

Nature

journal_title

Nature

authors

Mnih V,Kavukcuoglu K,Silver D,Rusu AA,Veness J,Bellemare MG,Graves A,Riedmiller M,Fidjeland AK,Ostrovski G,Petersen S,Beattie C,Sadik A,Antonoglou I,King H,Kumaran D,Wierstra D,Legg S,Hassabis D

doi

10.1038/nature14236

subject

Has Abstract

pub_date

2015-02-26 00:00:00

pages

529-33

issue

7540

eissn

0028-0836

issn

1476-4687

pii

nature14236

journal_volume

518

pub_type

杂志文章

相关文献

NATURE文献大全
  • Monoclonal anti-Fc receptor IgG blocks antibody enhancement of viral replication in macrophages.

    abstract::Flaviviruses, when complexed with antibody at subneutralizing concentrations, show enhanced replication in human and simian peripheral blood leukocytes (ref. 1, and J.S.M.P. and J.S.P., unpublished observations) and in P388 D1 and other macrophage cell lines. A comparable phenomenon has been demonstrated with alphavir...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/289189a0

    authors: Peiris JS,Gordon S,Unkeless JC,Porterfield JS

    更新日期:1981-01-15 00:00:00

  • In vivo cell sorting in complementary segmental domains mediated by Eph receptors and ephrins.

    abstract::The restriction of intermingling between specific cell populations is crucial for the maintenance of organized patterns during development. A striking example is the restriction of cell mixing between segments in the insect epidermis and the vertebrate hindbrain that may enable each segment to maintain a distinct iden...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/20452

    authors: Xu Q,Mellitzer G,Robinson V,Wilkinson DG

    更新日期:1999-05-20 00:00:00

  • Evidence from the AD 2000 Izu islands earthquake swarm that stressing rate governs seismicity.

    abstract::Magma intrusions and eruptions commonly produce abrupt changes in seismicity far from magma conduits that cannot be associated with the diffusion of pore fluids or heat. Such 'swarm' seismicity also migrates with time, and often exhibits a 'dog-bone'-shaped distribution. The largest earthquakes in swarms produce after...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/nature00997

    authors: Toda S,Stein RS,Sagiya T

    更新日期:2002-09-05 00:00:00

  • How automation is changing work.

    abstract:: ...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/d41586-018-07501-y

    authors: Segal M

    更新日期:2018-11-01 00:00:00

  • A binding site on mast cells and basophils for the anti-allergic drug cromolyn.

    abstract::Calcium permeability of basophil and mast cell membranes is stimulated on allergen binding to its specific membrane-bound IgE. This entry of Ca2+ ions into the cell triggers the degranulation and secretion process. Disodium cromoglycate (cromolyn DSCG), the disodium salt of 1,3-bis(-2-carboxychromon-5-yloxy)-2-hydroxy...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/286722a0

    authors: Mazurek N,Berger G,Pecht I

    更新日期:1980-08-14 00:00:00

  • SH2 domain specificity and activity modified by a single residue.

    abstract::Many intracellular targets of protein-tyrosine kinases possess Src homology 2 (SH2) domains that directly recognize phosphotyrosine-containing sites on autophosphorylated growth factor receptors and cytoplasmic proteins, and thereby mediate the activation of biochemical signalling pathways. SH2 domains possess relativ...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/369502a0

    authors: Marengere LE,Songyang Z,Gish GD,Schaller MD,Parsons JT,Stern MJ,Cantley LC,Pawson T

    更新日期:1994-06-09 00:00:00

  • Inhibition of suppressor T-cell development following deoxyguanosine administration.

    abstract::The expression of immunodeficiency in patients with specific purine enzyme defects indicates a crucial role of the purine salvage pathway in the acquisition and expression of normal immune function. One current hypothesis links the failure of normal lymphocyte development in these diseases to the accumulation of deoxy...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/285494a0

    authors: Dosch HM,Mansour A,Cohen A,Shore A,Gelfand EW

    更新日期:1980-06-12 00:00:00

  • Cyclosporine induces cancer progression by a cell-autonomous mechanism.

    abstract::Malignancy is a common and dreaded complication following organ transplantation. The high incidence of neoplasm and its aggressive progression, which are associated with immunosuppressive therapy, are thought to be due to the resulting impairment of the organ recipient's immune-surveillance system. Here we report a me...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/17401

    authors: Hojo M,Morimoto T,Maluccio M,Asano T,Morimoto K,Lagman M,Shimbo T,Suthanthiran M

    更新日期:1999-02-11 00:00:00

  • Na channels in skeletal muscle concentrated near the neuromuscular junction.

    abstract::Neuronal function depends crucially on the spatial segregation of specific membrane proteins, particularly the segregation associated with sites of synaptic contact. Understanding the factors governing this localization of proteins is a major goal of cellular neurobiology. A conspicuous example of synaptic specializat...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/313588a0

    authors: Beam KG,Caldwell JH,Campbell DT

    更新日期:1985-02-14 00:00:00

  • Limitations in the use of actomyosin threads as model contractile systems.

    abstract::Recent studies have suggested that actomyosin threads may provide a useful model for studying the properties of contractile systems. The development of highly sensitive positional feedback transducers has enabled the properties of these threads to be measured reproducibly. Potential applications include such systems a...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/287338a0

    authors: Altringham JD,Yancey PH,Johnston IA

    更新日期:1980-09-25 00:00:00

  • Laser-plasma acceleration of quasi-monoenergetic protons from microstructured targets.

    abstract::Particle acceleration based on high intensity laser systems (a process known as laser-plasma acceleration) has achieved high quality particle beams that compare favourably with conventional acceleration techniques in terms of emittance, brightness and pulse duration. A long-term difficulty associated with laser-plasma...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/nature04492

    authors: Schwoerer H,Pfotenhauer S,Jäckel O,Amthor KU,Liesfeld B,Ziegler W,Sauerbrey R,Ledingham KW,Esirkepov T

    更新日期:2006-01-26 00:00:00

  • Innate lymphoid cells support regulatory T cells in the intestine through interleukin-2.

    abstract::Interleukin (IL)-2 is a pleiotropic cytokine that is necessary to prevent chronic inflammation in the gastrointestinal tract1-4. The protective effects of IL-2 involve the generation, maintenance and function of regulatory T (Treg) cells4-8, and the use of low doses of IL-2 has emerged as a potential therapeutic strat...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/s41586-019-1082-x

    authors: Zhou L,Chu C,Teng F,Bessman NJ,Goc J,Santosa EK,Putzel GG,Kabata H,Kelsen JR,Baldassano RN,Shah MA,Sockolow RE,Vivier E,Eberl G,Smith KA,Sonnenberg GF

    更新日期:2019-04-01 00:00:00

  • A highly conserved ATPase protein as a mediator between acidic activation domains and the TATA-binding protein.

    abstract::Biochemical and genetic studies suggest the existence of mediators that work between the activation domains (ADs) of regulatory proteins and the basic transcriptional machinery. We have previously shown genetically that Sug1 interacts with the AD of the yeast activator Ga14. Here we provide evidence that the Sug1 prot...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/374088a0

    authors: Swaffield JC,Melcher K,Johnston SA

    更新日期:1995-03-02 00:00:00

  • A function for lipoxygenase in programmed organelle degradation.

    abstract::Membrane-enclosed organelles, a defining characteristic of eukaryotic cells, are lost during differentiation of specific cell types such as reticulocytes (an intermediate in differentiation of erythrocytes), central fibre cells of the eye lens, and keratinocytes. The degradation of these organelles must be tightly reg...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/26500

    authors: van Leyen K,Duvoisin RM,Engelhardt H,Wiedmann M

    更新日期:1998-09-24 00:00:00

  • A 61-million-person experiment in social influence and political mobilization.

    abstract::Human behaviour is thought to spread through face-to-face social networks, but it is difficult to identify social influence effects in observational studies, and it is unknown whether online social networks operate in the same way. Here we report results from a randomized controlled trial of political mobilization mes...

    journal_title:Nature

    pub_type: 杂志文章,随机对照试验

    doi:10.1038/nature11421

    authors: Bond RM,Fariss CJ,Jones JJ,Kramer AD,Marlow C,Settle JE,Fowler JH

    更新日期:2012-09-13 00:00:00

  • A superantigen encoded in the open reading frame of the 3' long terminal repeat of mouse mammary tumour virus.

    abstract::Mice express a collection of superantigens, which bind to class II major histocompatibility proteins and interact with T cells bearing particular V beta chains as part of their alpha beta receptors. These superantigens have been suggested to be encoded by exogenous or endogenous mouse mammary tumour viruses. One such ...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/350203a0

    authors: Choi Y,Kappler JW,Marrack P

    更新日期:1991-03-21 00:00:00

  • Cell intrinsic immunity spreads to bystander cells via the intercellular transfer of cGAMP.

    abstract::The innate immune defence of multicellular organisms against microbial pathogens requires cellular collaboration. Information exchange allowing immune cells to collaborate is generally attributed to soluble protein factors secreted by pathogen-sensing cells. Cytokines, such as type I interferons (IFNs), serve to alert...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/nature12640

    authors: Ablasser A,Schmid-Burgk JL,Hemmerling I,Horvath GL,Schmidt T,Latz E,Hornung V

    更新日期:2013-11-28 00:00:00

  • Non-volcanic tremor driven by large transient shear stresses.

    abstract::Non-impulsive seismic radiation or 'tremor' has long been observed at volcanoes and more recently around subduction zones. Although the number of observations of non-volcanic tremor is steadily increasing, the causative mechanism remains unclear. Some have attributed non-volcanic tremor to the movement of fluids, whil...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/nature06017

    authors: Rubinstein JL,Vidale JE,Gomberg J,Bodin P,Creager KC,Malone SD

    更新日期:2007-08-02 00:00:00

  • Disruption of neurotransmission in Drosophila mushroom body blocks retrieval but not acquisition of memory.

    abstract::Surgical, pharmacological and genetic lesion studies have revealed distinct anatomical sites involved with different forms of learning. Studies of patients with localized brain damage and work in rodent model systems, for example, have shown that the hippocampal formation participates in acquisition of declarative tas...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/35078077

    authors: Dubnau J,Grady L,Kitamoto T,Tully T

    更新日期:2001-05-24 00:00:00

  • TCR-peptide-MHC interactions in situ show accelerated kinetics and increased affinity.

    abstract::The recognition of foreign antigens by T lymphocytes is essential to most adaptive immune responses. It is driven by specific T-cell antigen receptors (TCRs) binding to antigenic peptide-major histocompatibility complex (pMHC) molecules on other cells. If productive, these interactions promote the formation of an immu...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/nature08746

    authors: Huppa JB,Axmann M,Mörtelmaier MA,Lillemeier BF,Newell EW,Brameshuber M,Klein LO,Schütz GJ,Davis MM

    更新日期:2010-02-18 00:00:00

  • Publisher Correction: Genomic insights into the 2016-2017 cholera epidemic in Yemen.

    abstract::In the HTML version of this Letter, the affiliations for authors Andrew S. Azman, Dhirendra Kumar and Thandavarayan Ramamurthy were inverted (the PDF and print versions of the Letter were correct); the affiliations have been corrected online. ...

    journal_title:Nature

    pub_type: 杂志文章,已发布勘误

    doi:10.1038/s41586-019-0966-0

    authors: Weill FX,Domman D,Njamkepo E,Almesbahi AA,Naji M,Nasher SS,Rakesh A,Assiri AM,Sharma NC,Kariuki S,Pourshafie MR,Rauzier J,Abubakar A,Carter JY,Wamala JF,Seguin C,Bouchier C,Malliavin T,Bakhshi B,Abulmaali HHN,Kuma

    更新日期:2019-02-01 00:00:00

  • A human neurodevelopmental model for Williams syndrome.

    abstract::Williams syndrome is a genetic neurodevelopmental disorder characterized by an uncommon hypersociability and a mosaic of retained and compromised linguistic and cognitive abilities. Nearly all clinically diagnosed individuals with Williams syndrome lack precisely the same set of genes, with breakpoints in chromosome b...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/nature19067

    authors: Chailangkarn T,Trujillo CA,Freitas BC,Hrvoj-Mihic B,Herai RH,Yu DX,Brown TT,Marchetto MC,Bardy C,McHenry L,Stefanacci L,Järvinen A,Searcy YM,DeWitt M,Wong W,Lai P,Ard MC,Hanson KL,Romero S,Jacobs B,Dale AM,Dai L

    更新日期:2016-08-18 00:00:00

  • Occluded bound calcium on the phosphorylated sarcoplasmic transport ATPase.

    abstract::The Ca2+ + Mg2+-activated ATPase of the sarcoplasmic reticulum is responsible for the active Ca2+ transport of this membrane system, the key feature of which is the formation of an energy-rich phosphorylated transport enzyme (EP) and its conversion. To understand the Ca2+-transport mechanism, it is essential to clarif...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/290271a0

    authors: Takisawa H,Makinose M

    更新日期:1981-03-19 00:00:00

  • Common ecology quantifies human insurgency.

    abstract::Many collective human activities, including violence, have been shown to exhibit universal patterns. The size distributions of casualties both in whole wars from 1816 to 1980 and terrorist attacks have separately been shown to follow approximate power-law distributions. However, the possibility of universal patterns r...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/nature08631

    authors: Bohorquez JC,Gourley S,Dixon AR,Spagat M,Johnson NF

    更新日期:2009-12-17 00:00:00

  • Low gene copy number shows that arbuscular mycorrhizal fungi inherit genetically different nuclei.

    abstract::Arbuscular mycorrhizal fungi (AMF) are ancient asexually reproducing organisms that form symbioses with the majority of plant species, improving plant nutrition and promoting plant diversity. Little is known about the evolution or organization of the genomes of any eukaryotic symbiont or ancient asexual organism. Dire...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/nature03069

    authors: Hijri M,Sanders IR

    更新日期:2005-01-13 00:00:00

  • In vitro synthesized bacterial outer membrane protein is integrated into bacterial inner membranes but translocated across microsomal membranes.

    abstract::The LamB protein is an integral membrane protein of the outer membrane of Escherichia coli. We have now found that, when synthesized in an E. coli cell-free translation system supplemented with inverted vesicles derived from the E. coli inner membrane, LamB protein is integrated into the vesicle membrane as assayed by...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/323071a0

    authors: Watanabe M,Hunt JF,Blobel G

    更新日期:1986-09-04 00:00:00

  • Summits that matter.

    abstract::The European Commission has made good progress in gathering support for its new programme of basic and applied research. Now Europe's industries and heads of state need to fulfill promises made two years ago. ...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/415457a

    authors:

    更新日期:2002-01-31 00:00:00

  • Origin of luteinizing hormone-releasing hormone neurons.

    abstract::Neurons expressing luteinizing hormone-releasing hormone (LHRH), found in the septal-preoptic nuclei and hypothalamus, control the release of gonadotropic hormones from the anterior pituitary gland and facilitate reproductive behaviour. LHRH-expressing neurons are also found in the nervus terminalis, a cranial nerve t...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/338161a0

    authors: Schwanzel-Fukuda M,Pfaff DW

    更新日期:1989-03-09 00:00:00

  • Acidic deposition: decline in mobilization of toxic aluminium.

    abstract::The mobilization of aluminium from acidic forest soils is arguably the most ecologically important consequence of acid deposition in the environment because of its adverse effects on soils, forest vegetation and surface water. Here we show that there has been a significant decline in the concentrations of aluminium sp...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/417242a

    authors: Palmer SM,Driscoll CT

    更新日期:2002-05-16 00:00:00

  • Sensory-evoked LTP driven by dendritic plateau potentials in vivo.

    abstract::Long-term synaptic potentiation (LTP) is thought to be a key process in cortical synaptic network plasticity and memory formation. Hebbian forms of LTP depend on strong postsynaptic depolarization, which in many models is generated by action potentials that propagate back from the soma into dendrites. However, local d...

    journal_title:Nature

    pub_type: 杂志文章

    doi:10.1038/nature13664

    authors: Gambino F,Pagès S,Kehayas V,Baptista D,Tatti R,Carleton A,Holtmaat A

    更新日期:2014-11-06 00:00:00