The Role of Striatal Subregions in Reinforcement Learning Process and Reward Prediction Error using Excitotoxic Lesion in Male Mice

碩士 === 國立臺灣大學 === 心理學研究所 === 103 === The striatum is the principal input structure of the basal ganglia that influences motor control and reward-based learning. Emerging studies indicate that it also contributes to update of action value and reward prediction error (RPE), a discrepancy between the p...

Full description

Bibliographic Details
Main Authors: Ya-Wen Liu, 劉雅文
Other Authors: Wen-Sung Lai
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/92972493319314380763
Description
Summary:碩士 === 國立臺灣大學 === 心理學研究所 === 103 === The striatum is the principal input structure of the basal ganglia that influences motor control and reward-based learning. Emerging studies indicate that it also contributes to update of action value and reward prediction error (RPE), a discrepancy between the predicted and actual rewards. Previous studies imply that three different subregions of the striatum participating in different kinds of learning processes. The dorsomedial striatum (DMS, also known as “associative striatum” in primates) which receives inputs from the association cortices is implicated in goal-directed behavior in rodents. The dorsolateral striatum (DLS, a part of the sensorimotor striatum in primates) is related to habit learning in rodents. The nucleus accumbens (NA) is implicated in representing predicted future reward, and the representation can be used to guide action selection for reward. However, the precise role or mechanism of each subregion in reinforcement learning and reward-based decision making is still under debate. The aim of this study is to examine the role of different striatal subregions (including DMS, DLS, and NA) in reinforcement learning process and reward prediction error using excitotoxic lesions and 2-choice dynamic foraging task in male C57/Bl6 mice. The 2-choice dynamic foraging task is a risky-choices task which consisted of two kinds of reward ratio learning. The behavioral performance of each of the three lesioned groups and their sham controls were recorded. Their trial-by-trial choice behavior were further analyzed and fit with a standard reinforcement learning model using the Bayesian estimation approach and matching law analysis to elaborate parameters for RPE and reward sensitivity. Compared to sham controls, overall behavioral results indicated that the DMS lesioned mice had more trials to reach the preset criteria and made more cumulated errors during the learning process of this dynamic foraging task. In contrast to the DMS group, both NA and DLS lesioned groups did not exhibited more accumulated trials or more cumulated errors. Reinforcement learning model analysis further revealed that both DMS and NA lesion mice had a lower learning rate in updating the RPE signaling and a slightly higher perseveration compared to their sham controls. But no significant difference was found in the reward sensitivity among the 3 groups. Collectively, the current study confirmed the importance of DMS and NA in the 2-choice dynamic foraging task and their roles in the value component and choice component of decision making. Excitotoxic lesion of DMS can significantly impair performance of probabilistic reward-based learning and decision making.