PublicatiesValue-driven adaptations of mesolimbic dopamine release are governed by both model-based and model-free mechanisms
The magnitude of dopamine signals elicited by rewarding events and their predictors is updated when reward value changes. It is actively debated how readily these dopamine signals adapt and whether adaptation aligns with model-free or model-based reinforcement-learning principles. To investigate this, we trained male rats in a Pavlovian-conditioning paradigm and measured dopamine release in the nucleus accumbens core in response to food reward (unconditioned stimulus) and reward-predictive conditioned stimuli (CS), both before and after reward devaluation, induced via either sensory-specific or non-specific satiety. We demonstrate that 1) such devaluation reduces CS-induced dopamine release rapidly, without additional pairing of CS with devalued reward, and irrespective of whether the devaluation was sensory-specific or non-specific. In contrast, 2) reward devaluation did not decrease food reward-induced dopamine release. Surprisingly, 3) post-devaluation reconditioning, by additional pairing of CS with devalued reward, rapidly reinstated CS-induced dopamine signals to pre-devaluation levels. Taken together, we identify distinct, divergent adaptations in dopamine-signal magnitude when reward value is decreased: CS dopamine diminishes but reinstates fast, whereas reward dopamine is resistant to change. This implies that, respective to above-mentioned findings, 1) CS dopamine may be governed by a model-based mechanism, and 2) reward dopamine by a model-free one, where 3) the latter may contribute to swift reinstatement of the former. However, changes in CS dopamine were not selective for sensory-specificity of reward devaluation, which is inconsistent with model-based processes. Thus, mesolimbic dopamine signaling incorporates both model-free and model-based mechanisms, and is not exclusively governed by either.Significance Statement Although it is well-known that dopamine plays a principal role in reward learning, the temporal dynamics and the associated theoretical framework of the dopamine response to changing reward values are debated. Most studies conceptualize and classify dopamine signals as governed exclusively by either model-based or model-free processes. However, our work shows involvement of both processes: the temporal dynamics of dopamine response to conditioned stimuli appear model-based, and the persistence of reward-evoked dopamine to the reward itself appears model-free. The implication of our findings is that either model-free and model-based dynamics can operate in a mixed framework, or that these reinforcement-learning concepts are not apt in describing the activity of the mesolimbic dopamine system in this experimental context.
Steun ons werk
De Stichting Vrienden van het Herseninstituut ondersteunt baanbrekend hersenonderzoek. U kunt ons daarbij helpen.
Steun ons werk