[en] We use efficient coding principles borrowed from sensory neuroscience to derive the optimal neural population to encode a reward distribution. We show that the responses of dopaminergic reward prediction error neurons in mouse and macaque are similar to those of the efficient code in the following ways: the neurons have a broad distribution of midpoints covering the reward distribution; neurons with higher thresholds have higher gains, more convex tuning functions and lower slopes; and their slope is higher when the reward distribution is narrower. Furthermore, we derive learning rules that converge to the efficient code. The learning rule for the position of the neuron on the reward axis closely resembles distributional reinforcement learning. Thus, reward prediction error neuron responses may be optimized to broadcast an efficient reward signal, forming a connection between efficient coding and reinforcement learning, two of the most successful theories in computational neuroscience.
Disciplines :
Neurosciences & behavior
Author, co-author :
SCHÜTT, Heiko ✱; University of Luxembourg ; Center for Neural Science and Department of Psychology, New York University, New York, NY, USA. heiko.schutt@uni.lu
Kim, Dongjae ✱; Center for Neural Science and Department of Psychology, New York University, New York, NY, USA ; Department of AI-Based Convergence, Dankook University, Yongin, Republic of Korea
Ma, Wei Ji ; Center for Neural Science and Department of Psychology, New York University, New York, NY, USA
✱ These authors have contributed equally to this work.
External co-authors :
yes
Language :
English
Title :
Reward prediction error neurons implement an efficient code for reward.
W. Schultz P. Dayan P.R. Montague A neural substrate of prediction and reward Science 1997 275 1593 1599 1:CAS:528:DyaK2sXhvFSntro%3D 10.1126/science.275.5306.1593 9054347
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MathWorks, 2018).
Balleine, B. W., Daw, N. D. O’Doherty, J. P. in Neuroeconomics (eds Glimcher, P. W. et al.) 367–387 (Academic Press, 2009).
F. Attneave Some informational aspects of visual perception Psychol. Rev. 1954 61 183 193 1:STN:280:DyaG2c%2Foslalug%3D%3D 10.1037/h0054663 13167245
Barlow, H. B. in Sensory Communication (ed Rosenblith, W. A.) 216–234 (MIT Press, 1961).
S. Laughlin A simple coding procedure enhances a neuron’s information capacity Z. Naturforsch. C Biosci. 1981 36 910 912 1:STN:280:DyaL38%2FmsVWhsg%3D%3D 10.1515/znc-1981-9-1040 7303823
O. Schwartz E.P. Simoncelli Natural signal statistics and sensory gain control Nat. Neurosci. 2001 4 819 825 1:CAS:528:DC%2BD3MXlslSjsLg%3D 10.1038/90526 11477428
X.-X. Wei A.A. Stocker Lawful relation between perceptual bias and discriminability Proc. Natl Acad. Sci. USA 2017 114 10244 10249 1:CAS:528:DC%2BC2sXhsVers7vL 10.1073/pnas.1619153114 28874578 5617240
K. Louie P.W. Glimcher R. Webb Adaptive neural coding: from biological to behavioral decision-making Curr. Opin. Behav. Sci. 2015 5 91 99 10.1016/j.cobeha.2015.08.008 26722666 4692189
R. Polanía M. Woodford C.C. Ruff Efficient coding of subjective value Nat. Neurosci. 2019 22 134 142 10.1038/s41593-018-0292-0 30559477
R. Bhui L. Lai S.J. Gershman Resource-rational decision making Curr. Opin. Behav. Sci. 2021 41 15 21 10.1016/j.cobeha.2021.02.015
K. Louie P.W. Glimcher Efficient coding and the neural representation of value Ann. N Y Acad. Sci. 2012 1251 13 32 10.1111/j.1749-6632.2012.06496.x 22694213
A. Motiwala S. Soares B.V. Atallah J.J. Paton C.K. Machens Efficient coding of cognitive variables underlies dopamine response and choice behavior Nat. Neurosci. 2022 25 738 748 1:CAS:528:DC%2BB38XhsFWhtbfL 10.1038/s41593-022-01085-7 35668173
N. Eshel et al. Arithmetic and local circuitry underlying dopamine prediction errors Nature 2015 525 243 246 1:CAS:528:DC%2BC2MXhsVCis77N 10.1038/nature14855 26322583 4567485
N. Eshel J. Tian M. Bukwich N. Uchida Dopamine neurons share common response function for reward prediction error Nat. Neurosci. 2016 19 479 486 1:CAS:528:DC%2BC28Xit1ajtbg%3D 10.1038/nn.4239 26854803 4767554
W. Dabney et al. A distributional code for value in dopamine-based reinforcement learning Nature 2020 577 671 675 1:CAS:528:DC%2BB3cXis1Sitrs%3D 10.1038/s41586-019-1924-6 31942076 7476215
K.M. Rothenhoefer T. Hong A. Alikaya W.R. Stauffer Rare rewards amplify dopamine responses Nat. Neurosci. 2021 24 465 469 1:CAS:528:DC%2BB3MXmtVSrt7k%3D 10.1038/s41593-021-00807-7 33686298 9373731
D. Ganguli E.P. Simoncelli Efficient sensory encoding and Bayesian inference with heterogeneous neural populations Neural Comput. 2014 26 2103 2134 10.1162/NECO_a_00638 25058702 4167880
C.D. Fiorillo P.N. Tobler W. Schultz Discrete coding of reward probability and uncertainty by dopamine neurons Science 2003 299 1898 1902 1:CAS:528:DC%2BD3sXitFKksr0%3D 10.1126/science.1077349 12649484
J.D. Cohen D. Servan-Schreiber A theory of dopamine function and its role in cognitive deficits in schizophrenia Schizophr. Bull. 1993 19 85 104 1:STN:280:DyaK3s7pslynsQ%3D%3D 10.1093/schbul/19.1.85 8095737
Wei, X.-X. Stocker, A. A. Bayesian inference with efficient neural population codes. In Artificial Neural Networks and Machine Learning—ICANN 2012, Vol. 7552 (eds Hutchison, D. et al.) 523–530 (Springer, 2012).
M.J. Frank L.C. Seeberger R.C. O’Reilly By carrot or by stick: cognitive reinforcement learning in Parkinsonism Science 2004 306 1940 1943 1:CAS:528:DC%2BD2cXhtVCqtbvK 10.1126/science.1102941 15528409
J.G. Mikhael R. Bogacz Learning reward uncertainty in the basal ganglia PLoS Comput. Biol. 2016 12 e1005062 10.1371/journal.pcbi.1005062 27589489 5010205
S. Kobayashi W. Schultz Influence of reward delays on responses of dopamine neurons J. Neurosci. 2008 28 7837 7846 1:CAS:528:DC%2BD1cXpsFOks70%3D 10.1523/JNEUROSCI.1600-08.2008 18667616 3844811
M.R. Roesch D.J. Calu G. Schoenbaum Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards Nat. Neurosci. 2007 10 1615 1624 1:CAS:528:DC%2BD2sXhtlKltLbK 10.1038/nn2013 18026098 2562672
H.R. Kim et al. A unified framework for dopamine signals across timescales Cell 2020 183 1600–1616 10.1016/j.cell.2020.11.013
C.K. Starkweather B.M. Babayan N. Uchida S.J. Gershman Dopamine reward prediction errors reflect hidden-state inference across time Nat. Neurosci. 2017 20 581 589 1:CAS:528:DC%2BC2sXltVWnt7o%3D 10.1038/nn.4520 28263301 5374025
S. Soares B.V. Atallah J.J. Paton Midbrain dopamine neurons control judgment of time Science 2016 354 1273 1277 1:CAS:528:DC%2BC28XitVSrs7jJ 10.1126/science.aah5234 27940870
Tano, P., Dayan, P. Pouget, A. A local temporal difference code for distributional reinforcement learning. In Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) 13662–13673 (Neural Information Processing Systems Foundation, 2020).
K. Louie Asymmetric and adaptive reward coding via normalized reinforcement learning PLoS Comput. Biol. 2022 18 e1010350 1:CAS:528:DC%2BB38XitVKlt73N 10.1371/journal.pcbi.1010350 35862443 9345478
K.I. Naka W.A.H. Rushton An attempt to analyse colour reception by electrophysiology J. Physiol. 1966 185 556 586 1:STN:280:DyaF2s%2FhsFakug%3D%3D 10.1113/jphysiol.1966.sp008002 5918059 1395841
Bredenberg, C., Simoncelli, E. P. Savin, C. Learning efficient task-dependent representations with synaptic plasticity. In Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) 15714–15724 (Neural Information Processing Systems Foundation, 2020).
C. Savin J. Triesch Emergence of task-dependent representations in working memory circuits Front. Comput. Neurosci. 2014 8 57 10.3389/fncom.2014.00057 24904395 4035833
W. Gerstner M. Lehmann V. Liakoni D. Corneil J. Brea Eligibility traces and plasticity on behavioral time scales: experimental support of neoHebbian three-factor learning rules Front. Neural Circuits 2018 12 53 10.3389/fncir.2018.00053 30108488 6079224
N. Frémaux W. Gerstner Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules Front. Neural Circuits 2016 9 85 10.3389/fncir.2015.00085 26834568 4717313
X.-X. Wei A.A. Stocker A Bayesian observer model constrained by efficient coding can explain ‘anti-Bayesian’ percepts Nat. Neurosci. 2015 18 1509 1517 1:CAS:528:DC%2BC2MXhsVOmtr%2FI 10.1038/nn.4105 26343249
N. Brunel J.-P. Nadal Mutual information, Fisher information, and population coding Neural Comput. 1998 10 1731 1757 1:STN:280:DyaK1cvis1ektw%3D%3D 10.1162/089976698300017115 9744895
Cover, T. M.; Thomas, J. A. Elements of Information Theory (Wiley, 1991).
M. Bethge D. Rotermund K. Pawelzik Optimal short-term population coding: when Fisher information fails Neural Comput. 2002 14 2317 2351 1:STN:280:DC%2BD38njvVejsg%3D%3D 10.1162/08997660260293247 12396565
Schütt, H., Kim, D. Ma, W. J. Code for efficient coding and distributional reinforcement learning. Zenodo https://doi.org/10.5281/zenodo.10669061