The basal ganglia (BG) are a collection of nuclei located at the central “core” of the brain (Figure 1), forming the functional bridge between the highest levels of processing in the neocortex, and the motor control networks in the brainstem (midbrain and hindbrain, progressing down to the spinal cord). The primary outputs of the BG are downward to these motor networks, and, via extensive connections into the thalamus, back up to the frontal areas of the neocortex, principally the prefrontal cortex. The striatum is the input nucleus of the BG, anatomically composed of the caudate and putamen, which receive extensive projections from the neocortex and other brain areas.
This central location and connectivity in the brain is consonant with the central role that the BG play in controlling behavior at all levels, from basic motor control to goal-driven plans, according to the Rubicon framework. We use various metaphors to describe its role, from puppet-master to orchestra conductor. One of the central questions in the field is whether the BG is responsible for selecting actions as a kind of winner-take-all (WTA) process (resulting in a single discrete action selected), or instead it provides a more graded, continuous, and parallel modulation of downstream motor pathways, with the final action selection taking place in these downstream areas. The latter role (i.e., late selection) is supported by strong computational constraints, and considerable extant data, consistent with the puppet-master analogy of many strings being pulled in various directions at the same time.
Figure 1:
Major areas of the basal ganglia (BG) and associated brain areas. The striatum is the input layer to the BG, composed of the caudate and putamen as large-scale anatomical features, with the elongated caudate receiving topographically organized sensory-motor inputs. The striatum receives a wide range of inputs from all over the brain, especially the neocortex, with the right panel showing a functional interpretation of this connectivity for the 3D volume of the striatum, reflected so that anterior is to left (Pauli et al., 2016). The ventral striatum (including the nucleus accumbens, NAc), is the input area for goal-driven, affective circuits (“value”), with adjacent more dorsal areas encoding motor-related value signals. The putamen is largely interconnected with motor areas and associated somatosensory cortex, corresponding to dorsolateral striatum in rodents. The globus pallidus externa (GPe) is the next stage of processing, providing the central “core” integration for the BG, with the GP interna (GPi) sending BG outputs subcortically and, via extensive connections into the thalamus, back up to the frontal neocortex. The substantia nigra pars reticulata (SNr) is also a major output pathway, while the pars compacta (SNc) provides dopamine innervation into the entire BG. The subthalamic nucleus (STN) also receives extensive cortical input from frontal areas, and interacts bidirectionally with the GPe. The amygdala, which is a primary affective / emotion area, is not a part of the BG but interacts extensively with it and is located at the extreme end of the tail of the caudate, and the lateral habenula (LHb) plays an essential role in driving dips in dopamine, based on extensive inputs from the ventral striatum among many other areas.
Figure 2:
Coronal slices through the right hemisphere of rat and NHP (non-human primate, i.e., macaque monkey) striatum, showing the projections from major frontal areas, with functional interpretations based on the Rubicon framework, consistent with the previous figure. The most ventral and medial area receives most strongly from the infralimbic (IL) frontal cortex in rat, which is homologous to Brodman area 25 in primates. This encodes the more abstract (pure) value of goal outcomes in Rubicon. Going lateral from there are more lateral OFC (orbitofrontal) areas (VOLO = ventrolateral OFC) that encode more concrete, sensory aspects of goal outcomes. Dorsal from there are areas receiving from dorsal ACC (anterior cingulate cortex, Brodmann 24; Cg = cingulum in rats), encoding affective values associated with different actions (mainly costs like effort, uncertainty, etc). Moving medial to that receives from the prelimbic (PL) cortex in rats (Brodmann 32, pregenual ACC in primates), which integrates IL and ACC inputs to compute overall utility (outcome value minus costs). PL/32 projects across most of the medial and ventral region. From Heilbronner et al., 2016.
Figure 2 and the right panel of Figure 1 show how different regions of the striatum interconnect with different frontal cortical areas, with associated functional interpretations consistent with the Rubicon framework (data from Pauli et al., 2016 and Heilbronner et al., 2016). These goal-oriented areas are located in the ventral and medial areas of the striatum, while the progressively dorsal and lateral areas (which comprise roughly 2⁄3 of the total striatum) are more strongly interconnected with motor cortical areas, and dorsolateral prefrontal cortex in primates. These different parts of the BG share a common canonical circuit but also have differences, consistent with the idea that they likewise share a common overall function with differences associated with their different specialized contributions.
There is extensive evidence showing that the dorsolateral BG encodes detailed motor control signals in topographic alignment with corresponding motor areas (e.g., fingers separate from legs, etc), with recent studies able to decode complex naturalistic motor behaviors in freely behaving rats (Markowitz et al., 2018; Klaus et al., 2017; Meng et al., 2018; Yttri & Dudman, 2016; Yttri & Dudman, 2018). Furthermore, studies of animals with complete lesions of the neocortex demonstrate that the BG can drive remarkably intact fine-grained motor control through its descending motor pathway projections (Grillner et al., 2020; Park et al., 2020; Arber & Costa, 2022). In evolutionarily more ancient vertebrates without much cortex, such as the lamprey (which has essentially the same BG structures as the rodent and primate; Grillner & Robertson, 2016), the BG is the primary driver of learning and behavior.
At a neural level, the SPNs (spiny projection neurons; also called medium spiny neurons, MSNs) in the striatum receive the strongest dopamine inputs of any area in the brain, and likewise have the greatest ability to rapidly clear dopamine from the synapse after it has been released. This allows these neurons to learn from rapid phasic changes in dopamine, i.e., bursts and dips relative to the baseline tonic firing level, in a manner consistent with the core principles of reinforcement learning (RL; Markowitz et al., 2023; Howard et al., 2017; Nair et al., 2015; Shen et al., 2008; Frank, 2005).
Thus, the dorsolateral striatum is widely believed to be the neural substrate for the actor component of the brain’s reinforcement learning system (Houk et al., 1995; Barto, 1995), where reward prediction error (RPE) signals from the dopamine critic system train the BG actor to improve motor actions to increase overall reward (see Figure 3 in the RL page). As discussed in the RL page, the most important computational challenge in RL is to find efficient forms of search, to avoid the curse of dimensionality in complex, real-world environments with large motor action repertories. Detailed aspects of the BG circuitry provide efficient dedicated-weight-gradient-search (DWGS) solutions for online motor control and learning.
In animals with more neocortex, especially in primates and humans, the BG also works intimately in concert with the cortex to organize behavior, by way of its extensive connections into the thalamus. These thalamic projections can provide two functions: direct modulation of cortical activity, which occurs in a very broad manner (which we term gating), and more fine-grained training of cortical learning through the same type of thalamic-driven predictive learning signals used in the Axon error-driven learning model.
Thalamic gating is a central mechanism in the Rubicon framework, for locking in the active maintenance of distributed goal representations across various regions of the prefrontal cortex, driven by the more medial and ventral areas shown in Figure 2.
Thus, these two roles of the BG make it one of the most central brain structures for understanding both basic motor control and more advanced goal-driven planning and higher-level cognition. The circuits that enable it to play these critical roles are explored first in the case of basic motor control below, and then elaborated in terms of the relationships with the prefrontal cortex in that page.
One overarching way of understanding the role of the BG across all these levels is in terms of a bidirectional and modulatory control system, that uses learning across two opponent pathways (Collins & Frank, 2014) to disinhibit output targets, in order to selectively control the flow of activity through these other circuits. Thus, it is a kind of “puppet master” pulling the strings of the brain, to get it to “do the right thing” in order to optimize overall reward, and accomplish desired goals.
In the following sections, the opponent pathway organization of the BG is examined, focusing primarily on the motor control functions supported by the dorsolateral BG. We then describe a new model of the BG that incorporates new patterns of connectivity that have recently been discovered. This PCore model shows how two different types of neurons within the globus pallidus externa (GPe), and projections from GPe back to the striatum, support the ability to integrate information over time during the decision-making process, which was not possible in the purely feedforward, classical model of BG function. Then we review a range of data that provide a coherent overall understanding of what the BG does, in the context of the PCore model.
Finally, with the big picture of BG function in place, we turn to some additional elements of the BG circuitry that play a critical role in shaping learning, which have a compartmental organization in the dorsal striatum. These include the striosomes, which project to dopamine areas instead of the motor and cortex outputs of the matrix neurons that comprise 85% of the striatum and are the main focus of the discussion below, and the CINs (cholinergic interneurons) which are few in number but large in their neuromodulatory impact on the rest of the striatal neurons.
Opponent pathways: D1 vs D2 / Go vs No
Figure 3:
The basal ganglia in a rodent brain, showing the two major pathways (in neurons within the matrix / matrisomes): direct from the striatum to the two output areas of GPi (globus pallidus interna, also known as the entopeduncular (EP) nucleus in rodents) and SNr (substantia nigra pars reticulata, which is larger than GPi and has more of the motor output pathways), and indirect that makes an additional hop through the GPe (globus pallidus externa). Note that the BG represents a big chunk of the rat forebrain. Figure from Gerfen & Surmeier, 2011.
Figure 3 shows the anatomy of the BG within the rodent brain, highlighting two distinct pathways from the matrix population of neurons in the striatum to the output areas of the substantia nigra pars reticulata (SNr) and globus pallidus interna (GPi) (also known as the entopeduncular (EP) nucleus in rodents). These two pathways are the direct (going directly from striatum to the SNr/GPi output) and the indirect, making another stop in the GPe before going on to the output. In the rodent, the SNr is significantly larger and interconnected with motor areas, while the GPi also has projections to the lateral habenula driven by the striosomes as discussed later. We will treat the SNr and motor portions of the GPi as functionally identical for the present purposes. There are also extensive output pathways via the ventral pallidum that are specific to the ventral/medial portions of the system, discussed later as well.
Figure 4:
Classical model of BG direct and indirect pathway function, where the direct pathway inhibits the BG output nuclei (SNr/GPi), thus disinhibiting its downstream targets such as the thalamus (i.e., a net Go permissive effect). The indirect pathway adds one more step, into the otherwise tonically active GPe neurons, so that when it is activated, it inhibits this GPe inhibition, and thus makes the output nuclei more active, increasing or maintaining inhibition on the downstream targets (i.e., a net No or inhibitory effect). The plots adjacent to each step of the pathway illustrate the activity of neurons over time, for the Go case, where the direct pathway neurons are strongly activated and the indirect pathways are not so much. Dopamine drives consistent learning and modulation of the direct and indirect pathways via D1 (Go) vs. D2 (No) receptors, such that bursts of dopamine facilitate the Go pathway and inhibit No, while the opposite holds for dips in dopamine.
Because all of the major neurons in the BG are inhibitory (unlike the cortex, where the principal neurons are excitatory), the direct and indirect pathways end up having opposing effects, a fact which has driven much of the theorizing about BG function. The direct pathway SPN neurons in the striatum (dSPN) directly inhibit the SNr/GPi outputs, which are otherwise tonically active and thus continuously inhibiting their output targets. Therefore, the dSPN direct pathway is net disinhibitory on the output targets, making it function as a “Go” pathway: it allows behavior to proceed, like a green traffic light (Figure 4). This analogy also captures the permissive (modulatory) nature of disinhibition: it doesn’t force anything to happen on its own: it just allows any “traffic” that might otherwise be there to proceed.
By contrast, the additional minus sign for the indirect pathway makes it net inhibitory on BG output targets, and thus a “No” pathway (red light) that tends to prevent behavior from proceeding. You can compute these effects by adding up the number of minus signs in the pathway: if it is even, then the net effect is positive / disinhibitory (two negatives cancel each other out), and if it is odd, then the net effect is negative / inhibitory:
- • Direct: dSPN -o SNr/GPi -o Thalamus = 2 (even, excitatory)
- • Indirect: iSPN -o GPe -o SNr/GPi -o Thalamus = 3 (odd, inhibitory)
The opposing effects of dSPN vs iSPN neurons align with the effects of dopamine neuromodulation via the D1 vs. D2 dopamine receptor types, which are differentially expressed on these neuron subtypes (Shen et al., 2008; Frank, 2005). The D1 dopamine receptor on dSPNs is excitatory and promotes LTP (long-term potentiation; synaptic plasticity) when stimulated with bursts of dopamine. The D2 receptor on iSPNs is inhibitory and promotes LTD from dopamine bursts. Furthermore, the opposite pattern holds for dopamine dips, directly implementing both sides of Thorndike’s law of effect for instrumental conditioning: do more of things that give you dopamine bursts, and less of things that give you dips in dopamine.
Action selection
The simple Go vs. No logic of these two BG pathways aligns with impairments in people with Parkinson’s disease and related basal ganglia disorders known as catatonia, which can be characterized as a specific problem in initiating motor actions. This was compellingly demonstrated in the 1990 movie Awakenings, where a patient (Robert De Niro) could keep walking once he got started, but otherwise could be stuck for hours unable to start. This convergence of circuitry and initiation deficits led several authors to suggest that the primary function of the BG is in action selection: the decision of what action to perform (Albin et al., 1989; Chevalier & Deniau, 1990; Alexander & Crutcher, 1990; Mink, 1996; Redgrave et al., 1999; Gurney et al., 2001; Frank et al., 2001; Brown et al., 2004; Nambu, 2004; Bogacz & Gurney, 2007).
Once a selection has been made, the action can proceed without further input from the BG, explaining the selective initiation deficits in Parkinson’s patients: they cannot select any action. Furthermore, this account has the advantage of distinguishing the contribution of the BG from that of the cerebellum, which is widely thought to be important for rapid online adjustments to motor control signals (Albus, 1975; Ito, 1998; Buonomano & Mauk, 1994), and not for selection or initiation.
Figure 5:
Conceptual model of the BG performing action selection, allowing only one selected action to proceed (via the direct Go pathway), while inhibiting the others via the indirect No pathway (from Gazzaniga et al, 2018).
The simplest form of the action selection model posits that the selected action gets a Go disinhibitory signal from the BG, while all the other unselected actions get a No inhibitory signal, as illustrated in Figure 5.
However, this is not the only way in which action selection can operate, especially at the level of the striatum. For example, one can think of the Go and No pathways in the striatum as contributing graded weighting signals that are effectively voting in favor or against a given action, with downstream processes (in the motor midbrain and spinal cord) being responsible for integrating those votes. Critically, even a selected action can get a number of No votes, as long as it also gets sufficient Go votes. Furthermore, this ability to separately represent the Go vs. No votes has important computational advantages over an alternative that immediately collapses these votes into a single signal (Collins & Frank, 2014).
This opponent-process dynamic of push vs. pull, ying vs. yang, with lots of both going on in most cases, is more consistent with considerable data showing that the direct and indirect pathway neurons associated with a given action are both activated, instead of having fully opposite patterns of activity (Cui et al., 2013; Markowitz et al., 2018; Klaus et al., 2017; Meng et al., 2018). This graded balancing of opponent forces is also consistent with optogenetic studies that clearly show that activation of the direct pathway facilitates action, while activation of the indirect pathway inhibits it (Kravitz et al., 2010; Yttri & Dudman, 2016), and evidence of the opposing effects of these pathways is evident in the identifiable and decorrelated behavior of dSPNs vs iSPNs in the above studies.
In summary, an opponent, competitive dynamic between Go and No pathways does not mean that these neurons are fully anticorrelated: as in any good sports game, lots of good moves are made on both sides, and each team can play with strong vigor, even though one or the other ends up winning in the end. Indeed, at the level of muscle forces, there is always a lot of co-contraction taking place among opposing muscles, which is necessary to maintain overall muscle tension and postural stability.
Parallel action selection and online motor control
The action selection framework is often construed in terms of a strongly serial, discrete conception of motor action, where a single discrete action is performed at a given time. However, at the level of the muscles that actually drive motor action, it is actually a highly parallel “symphony” of activity that is in constant motion, with graded contributions. This parallel, graded form of processing is synergistic with the efficient search through motor control space, as noted at the outset. Thus, to the extent that the dorsolateral BG is involved in motor control at the level of muscles, it makes more sense to think of the BG as a conductor of this symphony, providing dynamic, bidirectional modulation of each of the different muscles in order to better coordinate their activity, to accomplish desired motor outcomes.
Under this more dynamic, parallel scenario, the Go vs. No pathways are not discretely and sequentially picking one out of many possible actions, but rather the dynamic balance between Go and No determines the extent to which each muscle or muscle group is selectively disinhibited at any given point in time, in parallel. If there is more Go than No, that pathway will be disinhibited and allowed to act more strongly, while a greater No than Go balance with tend to inhibit the pathway and prevent it from interfering with other muscle activity that should proceed. To return to the puppet-master analogy used above, the strings by which a puppet is controlled need to have bidirectional control: you can’t operate a puppet in zero gravity! In the neural domain, the dynamic Go vs. No competition gives you this bidirectional control.
This conception of motor control converges with the conclusions of numerous studies on the descending pathways from the BG to motor output areas in the midbrain and spinal cord, as summarized in recent reviews (Grillner et al., 2020; Park et al., 2020; Arber & Costa, 2022). For example, Park et al. (2020) use the analogy of the function of a graphical equalizer in a stereo system, which modulates the strength of different frequency bands, and also note that many of these motor control studies specifically find that BG lesions affect dynamic online motor control but not action initiation. Indeed, it is likely that the catatonia-like initiation deficits arise from active imbalances in Go vs. No pathways (more No than Go), whereas full lesions just remove the BG modulation entirely.
In addition, one of the considerations that has led to the more serial, discrete conception of BG function is the extreme funnel-like compression of the neural signal as it flows through the network, with only roughly 30k neurons in each hemisphere in the SNr/GPi output nuclei, versus about 2.8 million in the striatal inputs (Oorschot, 1996). However, 30k is still a large number compared with the number of distinct muscles (roughly 600 in the human and a similar order of magnitude in the rat), although this 30k number includes all of the different pathways through the BG, so the motor control portion may be more like 10k or so. It is clear from anatomical studies that different pathways from the BG output project to different motor control centers in the midbrain, consistent with a parallel modulation of the descending motor system (Arber & Costa, 2022).
As we review in detail below, recordings of the activity of neurons in the BG output pathways would seem to provide a more definitive understanding of what it is contributing to motor control, relative to the strong focus in the field on properties of the striatal input neurons. The relatively few such studies of SNr and GPi neurons clearly support the parallel, bidirectional modulation model (Barter et al., 2015; Freeze et al., 2013; Gulley et al., 1999), with individual neurons having graded activity strongly associated with distinct motor pathways and positions of the muscles in those pathways.
Figure 6:
Parallel loops through the BG circuit in the mouse, supporting a highly parallel search process for learning to control the motor and other brain areas interconnected with the BG. The SNr (Nigra) output integrates related striatal signals, presumably around specific motor effectors (muscle groups), with their data showing strong convergence from direct and indirect pathways onto the same SNr neurons. Note that the anatomical configuration of SNr is inverted relative to striatum, with ventral areas of SNr processing dorsal striatum inputs. The parafasicular nuclei (PF, part of the intralaminar thalamus) provides an important credit assignment feedback signal to the striatum in the PCore model. From Foster et al., 2021.
Furthermore, detailed anatomical pathway tracing through all of the circuits of the BG, and into and out of the cortex (Figure 6; Foster et al., 2021), shows a remarkably topographic set of parallel loops through the BG, consistent with earlier reports (Alexander et al., 1986; Haber, 2003). The SNr output integrates related higher-dimensional striatal inputs to produce a continuously valued, motor-specific signal reflecting the individual “votes” across all of these related neurons. See compartmental organization below for further details on these parallel pathways, and ventral pallidum for other pathways specific to the ventromedial BG.
While the relatively low-dimensional, parallel, graded BG output signals make sense from the descending motor control perspective, how does this kind of signal work in the context of ascending projections through the thalamus and back up to the neocortex, where there are many more millions of neurons? And how does the BG output coordinate with the direct descending projections from cortical motor areas into the same midbrain and spinal motor areas?
If we retain the parallel, muscle-based conception of motor control in the cortex, then the BG ascending projections can be playing a similar puppet-master / conductor / traffic light role in modulating these muscle-based pathways as they go through the thalamus: even if there are many more traffic lanes, the same low-dimensional Go vs. No control can still provide useful control over the flow of signals through these lanes. Furthermore, this modulation can drive error-driven learning in cortex, in much the same way that thalamic projections through the pulvinar nucleus drive predictive learning in posterior cortex.
After we explore our updated, detailed model of the BG circuitry in the next section, we return to these big-picture questions of overall BG function, across the multiple different pathways and domains in which it operates. The available evidence strongly suggests that these different pathways have different functional properties, with neurons in the ventral and medial areas having more coherent, discrete behavior, while the dorsolateral motor control areas are more parallel, graded and modulatory. How do these different modes of behavior emerge from the same type of circuitry? Are there other strong commonalities among these apparent differences? These are some of the questions we revisit in light of the detailed model.
The Pallidal-core (PCore) model
Although there were earlier indications of inaccuracies and omissions in the classical direct vs. indirect pathway model (Figure 4), relatively recent molecular labeling techniques have now provided definitive evidence for a new anatomical model, which puts the GPe in a more central role in shaping the dynamics of the BG (Courtney et al., 2023 (review); Mallet et al., 2012; Saunders et al., 2018; Cui et al., 2021; Dodson et al., 2015; Abdi et al., 2015; Guilhemsang & Mallet, 2024; Nambu & Chiken, 2024). Suryanarayana et al. (2019) developed a version of the earlier action-selection model of Gurney et al. (2001) incorporating the new GPe anatomy, which informed the PCore model described below.
Figure 7:
PCore model of the basal ganglia, which is centered around the multiple projections into and out of the GPe neurons that affect every other part of the BG circuitry, putting the GPe Pallidum at the core of its function. The GPeAk (arkypallidal) neurons receive from the direct pathway striatal neurons (dSPN), while the prototypical (GPePr) neurons receive from the indirect pathway, as in the classical model (hence the name). Because GPeAk projects inhibition back up to the striatum, it must be inhibited in order to disinhibit the SPN neurons, which is accomplished by the direct pathway inputs. The iSPN neurons can also get some relief by inhibiting the GPeAk in cases where they are more active, and are directly inhibiting the dSPNs (but not the other way around). The hyperdirect pathway into the STN drives initial “brakes” on the system preventing premature responding. The numbers below each nucleus indicate the rough number of neurons in each hemisphere in a rat (Oorschot, 1996).
Our implementation of this new circuitry is summarized in Figure 7, showing two of the main subtypes of GPe neurons: GPeAk are the arkypallidal GPe neurons, which express the molecular markers NPAS1 and FOXP2, while the GPePr are the prototypical GPe neurons that have a connectivity pattern similar to the GPe in the classical model, and express PV (parvalbumin, as discussed in inhibition) and KCNG4. As shown in the figure, roughly 45% of the GPe neurons are prototypical, while 18% are arkypallidal, with another 12% projecting to the SNc dopamine area (similar to striosomes neurons in the striatum, which we discuss below). The remaining neurons constitute a more heterogenous group, which we ignore for the time being.
The striatum remains largely as in the classical model, with the one new wrinkle that the lateral inhibition among the SPN neurons is strongly asymmetric, with iSPNs inhibiting dSPNs but not the other way around (Taverna et al., 2008), which is compatible with the remainder of the dynamics from the GPe.
Unlike the classical model, the GPe receives significant input from the direct pathway dSPN neurons, which strongly favors the GPeAk arkypallidal neurons. Meanwhile, the prototypical GPe neurons are so-named because they almost exclusively receive input from the iSPN neurons, as in the classical model. The GPeAk neurons also diverge from the classical model by projecting back up to the striatum, which provides a key insight into their function. Meanwhile, the prototypical GPe neurons inhibit themselves and the GPeAk neurons, while also sending the classically described inhibitory projection to the output nuclei (SNr/GPi).
This circuit converging on the GPeAk neurons effectively replicates the classical direct vs. indirect dynamic (Figure 4), but with the GPeAk as the target of direct inhibition and indirect disinhibition instead of the SNr/GPi output neurons. This allows the GPeAk neurons to compute an integrated balance of the Go vs. No forces, which then feeds back up into the striatum, which enables this internal circuit to drive dynamic integration processing that is otherwise not possible in the strictly feedforward classical model. This is the “core” of the PCore model.
Finally, the STN (subthalamic nucleus) plays a critical role in the BG circuit (and is a major target of therapeutic treatments in Parkinson’s disease), supporting the hyperdirect projection from the cortex and uniquely sending excitatory glutamatergic projections to the GPe and SNr/GPi neurons. This set of connections has long been recognized as important for providing an initial “brake” on any disinhibitory effects of the BG, by driving an initial burst of excitation to the SNr/GPi outputs that are inhibiting the downstream targets of the BG (Frank, 2006). An alternative role for the STN is as a normalizing factor to retain the sensitivity of the system under different strengths and numbers of inputs (Gurney et al., 2001; Gurney et al., 2001; Gurney et al., 2015; Bogacz & Gurney, 2007), which may characterize a different subset of STN neurons.
In the PCore model, the STN projections into the GPePr are critical for driving a reciprocal inhibitory reflection back from GPePr to STN, which opens up a window where the SNr/GPi outputs can actually become net pinhibited, because a subset of the STN neurons go into an extended pause in firing after their initial burst of activity (Fujimoto & Kita, 1993; Magill et al., 2004). This excitatory-inhibitory relationship between STN and GPePr is widely implicated in the tremor associated with Parkinson’s disease (Bevan et al., 2002; Nevado-Holgado et al., 2014; Lindahl & Kotaleski, 2016).
The STN also projects excitation to the GPeAk, which causes it to more strongly inhibit the striatum. This provides a mechanism for the notably transient, phasic nature of SPN firing (as shown in neural recording data shown below). The tonic activity of GPeAk keeps the SPNs inhibited, and the increased excitation from the STN associated with an increased in cortical activity compensates for the increased activity that the SPNs also receive. Then, after their activity pause, the recovery in STN firing activates the GPeAk again, terminating the transient SPN activity (and putting the brakes back on in the BG output nuclei). See STN for more details and discussion of the neural basis for these dynamics – there are multiple cell types in the STN and it is likely doing multiple things.
Figure 8:
Dynamics of the PCore model for a Go > No case (left panel) vs a No > Go case (right panel), illustrating the central role of the GPeAk neurons. Each layer shows a raster plot of spikes across the 25 neurons per layer, with time going back in depth for each layer. Thus, you can see the initial burst of activity in the STN driven by the hyperdirect pathway inputs, which puts on the “brake” at the start, allowing the rest of the dynamics to unfold, as the brake is released by GPePr inhbiting the STN. See text for further explanation of each of the steps highlighted. The striatum neurons are labeled as Mtx, representing the matrix (vs. striosome) subset, with dSPN = Go and iSPN = No.
Figure 8 shows the PCore model in action (see the BG ventral simulation) in cases where it ends up being net disinhibitory (“Go”) and net inhibitory (“No”). For the Go case, the key steps are:
1. The STN hyperdirect activity provides a burst of activation to the SNr/GPi, GPePr, and GPeAk neurons, effectively preventing the SNr/GPi output from being inhibited. This is the initial “brake” on the system, which is released when the GPePr inhibits the STN in turn, and the SKCa calcium-gated K channels produce a longer-lasting inhibitory pause, providing a window for the BG to control the output pathways (i.e., a gating window).
2. Stronger learned weights from the “ACC” inputs (which are just clamped input layers in this model) to the dSPN cause these neurons to respond more vigorously than the opponent iSPN neurons.
3. This dSPN activity directly inhibits the GPeAk, initiating a “positive” disinhibitory feedback loop that facilitates further overall “Go” mode activity. This is effectively a kind of amplifier circuit, which is important for enabling the Go vs. No distinction to be made across a wide range of different input strengths.
4. The reduced GPeAk activity disinhibits the striatum neurons (both dSPN and iSPN), which allows the dSPN pathway to ramp up further, although this is also tempered by increased iSPN pathway firing which directly inhibits the dSPN cells.
5. The increased striatal dSPN activity inhibits the SNr/GPi neurons…
6. Thereby accomplishing the classical Go pathway dynamic of disinhibiting the downstream targets of the BG, in this case the thalamus.
7. This phasic output disinhibition is terminated when the STN neurons recover from their activity pause, thus exciting the GPeAk, which in turn inhibits the SPNs. The brakes also go back on in the SNr/GPi output nuclei.
The case illustrated on the right shows what happens when the iSPN No pathway gets more excited by the inputs.
1. Now the iSPN neurons are more active initially.
2. Which inhibits the GPePr, thereby disinhibiting the GPeAk.
3. Thus the continued GPeAk activity inhibits the striatum, preventing the dSPN Go pathway from getting more active.
4. Therefore, the SNr/GPi remains active…
5. And the thalamus remains inhibited by the SNr/GPi.
In summary, the GPeAk neurons integrate the direct dSPN pathway inputs relative to the iSPN inputs via GPePr, providing an internal integration of this balance that then feeds back to modulate the activity of the striatal input pathways. When GPeAk is net inhibited, it is more likely to amplify the dSPN pathway and drive overall disinhibition (“Go”), whereas the opposite holds when GPeAk is disinhibited via iSPN inhibition on GPePr. There are also direct excitatory cortical projections into the GPeAk that are functionally consistent with the STN hyperdirect inputs, providing a baseline against which these differential dynamics operate (.
This ability for the direct and indirect pathways to interact through their convergence in the GPe resolves one of the main problems with the classical theory: there is only sparse, weak inhibitory connectivity among SPN neurons in the striatum, which seems unlikely to support a robust inhibitory competition and selection dynamic (Tunstall et al., 2002). Because the GPe is much more compact in size than the entire striatum (i.e., roughly 46,000 neurons in the GPe in a rat, compared to roughly 2.8 million in the striatum, as shown in Figure 8; Oorschot, 1996), it is therefore much easier for these pathways to interact competitively with each other, relative to neurons having to directly compete in the much larger space of the striatum.
Figure 9:
Results of optogenetic selective activation of iSPN (a) or STN (b) neurons. Activating ISPNs inhibited GPePr (“Proto”) neurons, while disinhibiting GPeAk (“Arky”) neurons. Activating the STN much more strongly activated GPePr vs GPeAk neurons, consistent with a stronger projection to GPePr.
These dynamics are consistent with various recorded patterns of neural activity, for example in a recent optogenetic stimulation experiment (Ketzef & Silberberg, 2021) that selectively activated iSPN neurons or STN neurons (Figure 9). A number of other studies report consistent data on the effects of GPeAk activation and inhibition of motor action (Mallet et al., 2016; Glajch et al., 2016; Pamukcu et al., 2020; Gu et al., 2020; Tachibana et al., 2008; Dodson et al., 2015).
From an implementational perspective, GPe neurons are simulated using the same Axon spiking neuron model, except that their tonic activity is simulated by providing a tonic excitatory conductance.
Functional benefits of PCore
At a broad-brushstroke level of analysis, the PCore model exhibits similar behavior to the classical BG model, with a strong opponent dynamic between a net Go vs. a net No pathway. However, the internal integration of the Go vs. No balance within the GPeAk neurons, along with similar other balancing interactions (e.g.,iSPN inhibition of dSPNs, and self-inhibtion of the GPePr neurons) makes it considerably more robust than the classical model, and allows it to exhibit important temporal dynamics reflecting the relative balance of these pathways.
Figure 10:
Testing results from the ventral BG model, trained with dopamine based on reward prediction error to do Go gating when the input signals indicate more positive reward versus negative costs are available, and No when the opposite is true. The testing sweeps through increments of negative costs in an inner loop, and positive rewards in the outer looop, as shown on the lower portion of the plot. The Gated line the proportion of times that the model did Go gating, which is strongly determined by the ratio of positive to negative, across the full range of these values. This demonstrates the balanced nature of the interactions between pathways. The RT line shows the normalized number of cycles taken when a Go gating outcome occurred, showing that the model was significantly slower in processing the cases with greatest conflict, where positive is very close to negative. Furthermore, the overall trend is that with stronger positive values, RT is overall faster. These patterns are widely observed in decision-making studies, and predicted by normative drift-diffusion models. This demostrates that the model naturally exhibits an information-integration dynamic to accumulate more input over time when the decision is more ambiguous.
The BG ventral simulation explores these dynamics in a simple decision-making context, which demonstrates that the system can apply a ratio-based decision threshold across a wide range of raw input activation strengths, consistently exhibiting disinhibitory Go dynamics when the relative strength of initial Go vs. No pathway activation favors Go (Figure 10). In addition, as the Go vs. No balance gets closer, the system takes longer to make a decision, allowing more time for additional input signals to be integrated and improve the overall quality of the resulting decision.
This temporal dynamic is consistent with the normative drift-diffusion model of decision making, which has been identified with the function of the basal ganglia empirically (Yartsev et al., 2018; Dunovan et al., 2015; Doi et al., 2020; Ding & Gold, 2013) and theoretically (Bogacz & Gurney, 2007; Bogacz et al., 2016).
In the “hold-your-horses” model of STN function (Frank, 2006), this ability to modulate the BG decision making speed as a function of ambiguity was attributed to the sustained activity of the STN in response to ambiguous activity patterns in the cortical inputs to the STN. By contrast, this dynamic arises in the PCore model from the GPeAk integration and feedback loop mechanism, which has the important advantage of being based directly on the striatal dSPN vs. iSPN responses to the cortical inputs, rather than the cortical representations themselves.
The STN’s primary function in the PCore model is to provide the burst-then-pause firing pattern that opens a phasic window for the BG to successfully inhibit the output nuclei and disinhibit downstream targets. This ensures that the BG only drives transient, phasic effects, consistent with the data reviewed below.
The action selection model of Gurney et al. (2001) (Gurney et al., 2001; Gurney et al., 2015; Bogacz & Gurney, 2007; Humphries & Gurney, 2021) instead requires a sustained, proportional activity from the STN, so it is clear that these different models make distinctive predictions about STN activity, making this an important topic for further empirical tests. It is also possible that phasic and sustained modes of STN activity are supported by different subtypes of STN neurons.
Finally, the GPeAk disinhibition of striatum strongly predicts that there will be some correlation in dSPN and iSPN firing, even though they are in an overall oppositional relationship, consistent with observed data (Cui et al., 2013). However, the oppositional relationship is still evident overall, consistent with the properties of dSPN and iSPN firing in naturalistic behavioral contexts (Markowitz et al., 2018; Klaus et al., 2017).
Overall, several features of the PCore circuit produce behavior consistent with an an action-initiation role, including the brake-then-release dynamics in the STN, and the progressive positive feedback loops between dSPN and GPeAk. Nevertheless, the balanced dynamics should also support generating more graded, balanced combinations of net disinhibitory tone, consistent with an ability to provide ongoing modulatory control over motor actions as they unfold.
Functional contributions of the BG
Figure 11:
Circuits involving the BG, from the frontal cortex to descending motor areas, arranged schematically according to the rat anatomy (as in earlier Figures 2 and 3). The ventral and medial circuits are as described in Figure 2 (V = ventral, M = medial, D = dorsal, L = lateral, S = striatum, aIC = anterior insula). Primary and secondary motor areas target dorsal and lateral striatum, and anterior motor areas (a weak analogue of the much larger dlPFC in primates) target dorsal medial striatum. The SNr output also has a topographic map as shown. The thalamic nuclei targeted by DLS outputs are ventral medial and ventral lateral, which also interconnect with the cerebellum. Brainstem motor targets include the superior colliclulus (SC; orienting and approach), medullary reticular formation (MRF; locomotion), and pontine reticular formation (PRF; limbs, digits).
Moving back out from the circuit dynamics level, we return to the big picture question of what the BG contributes to overall brain function, across the different domains in which it is involved. Figure 11 provides a summary of some of the major patterns of connectivity between the BG and the rest of the brain, in the rat. The ventral and medial portions of the striatum were already discussed more in Figure 2, and Figure 11 fills in some of the circuits involving the dorsal and lateral motor areas, continuing through the nuclei in the thalamus (VM = ventromedial, VL = ventrolateral) and some of the brainstem motor areas that the BG output disinhibits.
The big picture here is that the BG is positioned with its “fingers on the buttons” of every major pathway of brain function, from motor to motivation. Figure 11 does not show the extensive sensory projections into the BG from posterior areas of the neocortex, which provide abstracted environmental state information according to the standard reinforcement learning (RL) paradigm. The simple story is clearly that the BG is where RL happens in the brain, with phasic dopamine signals driving learning to optimize behavior.
To understand in more detail what each area is contributing, we examine task-related neural activity in the ventromedial (VM) and dorsolateral (DL) BG areas in the following sections. This data shows that VMS striatum neurons are active at the start and end of a chunk of behavior, but not at critical decision points along the way. This is consistent with the Rubicon framework, reflecting a single goal-engaged step. Thus, the BG is not just a simple model-free form of RL, but rather supports a proactive, goal-driven form of model-based RL that organizes behavior over time to accomplish goals selected at the start of a goal-engaged window.
Meanwhile, in DLS striatum, we look at what the motor encoding of actions looks like, and also trace that back to the relatively rare recordings from the output pathways.
Goal-related activity in VS
Figure 12:
Activity of neurons in the ventromedial striatum (VMS) of the rat, recorded in a simple cued T-maze task, over the full course of acquisition and re-acquisition with a different cue modality (AA1 = initial auditory cue acquisition, with AA2 and AO showing further training, followed by TA = tactile cue). VMS neurons consistently fire right at the start, and at the end, but not at the critical decision point where the rat senses the cue and decides to go into the right or left arm. This is consistent with the gating of an overarching goal / plan at the start, which is then updated with the actual outcome at the end. The top figures show z-scored data over the population of neurons, while lower panels show actual firing rates and raster plots of individual trials, recording individual neural spiking for the click-responsive neurons and the reward responsive neurons. Activity is generally very sparse, and clearly strongly driven by the warning cue and reward onset, respectively. Figure adapted from Atallah et al., 2014.
Figure 12 shows the activity of neurons in the ventromedial striatum (VMS) of a rat as it learns to navigate a T maze, which is one of the most widely studied behavioral paradigms (Atallah et al., 2014). The rats are placed at the start location, and a auditory warning click (WC) signals that the trial is about to start, at which point a gate opens. As they run down the first arm, the rats then hear another auditory cue that indicates whether they should turn left or right at the T junction, to get a food reward. The data clearly shows that one subset of VMS neurons respond strongly and phasically to the warning click, consistently throughout the different phases of training. Then two other subsets of neurons respond in anticipation of the reward at the goal, or after reward receipt.
What is not observed are neurons responding to the discriminative cue that actually tells the rats which way they should turn to get the reward. This striking lack of responding, present from the start, is consistent with the idea that the VMS neurons are playing a phasic role at the initial goal-selection phase, triggered by the warning click, where they activate a plan that encompasses the behavioral strategy of using the cue to determine which way to turn. Thus, there is no further updating of that plan when the cue occurs. At the end, there is activity tracking the expectation of a reward, and after receiving the reward, which presumably are important for updating and clearing the goal representations based on what actually happened.
The arm maze simulation shows how the Rubicon model exhibits this same pattern of behavior in making similar kinds of behavioral choices. It is also important to note that this activity pattern is inconsistent with a simpler kind of model-free RL learning that would be predicted by the TD algorithm. This algorithm would predict that over the course of learning, activity would transition from the discriminative cue back to the initial warning click (i.e., a “backup”), as the chaining-like dynamic of temporal difference error signals moves progressively backward in time to find the earliest reliable predictor of reward.
Motor activity in DL
Figure 13:
Averaged traces of neural activity (yellow colors) and inactivity (blue) of dSPNs and iSPNs in the DLS of freely behaving mice, associated with different actions as labeled. Each line represents the behaviorally synchronized average for an individual neuron. There is bidirectional modulation of neural activity in both pathways specific to each action, straddling the onset of these actions. Any given neuron has a relatively brief phasic window of activity, which reliably occurs at about the same relative point in time across instances of the behavior. From Markowitz et al., 2018.
Figure 13 shows neural activity in the dorsolateral striatum (DLS) of freely behaving mice associated with different actions that they naturally exhibited (Markowitz et al., 2018). The most evident properties of these responses are:
• Individual SPN neurons exhibit reliable, precisely timed, and brief windows of activity and inhibition in relationship to distinct motor actions, spanning the time window from slightly anticipating the action onset (by roughly 50-70 ms) to roughly 1 s post-action. This is very different from the VS neurons shown above.
This means that the SPN activity is clearly not exclusively or even predominantly involved in action initiation, thereby raising significant questions about the action selection model. However, further experiments reported in Markowitz et al., 2018 showed that DLS inactivation resulted in nearly identical performance of individual actions, but the overall pattern of actions was impacted, and the ability to shape action to avoid a noxious smell was impaired. Other analyses showed that neural activity was conditional on the sequential context in which an action is performed. Thus, this BG activity is likely important for a more graded, parallel, ongoing shaping of motor actions that ends up determining which actions are performed, as discussed earlier.
The brief windows of activity (which were present in individual trials, not just the averaged data) are consistent with the STN gating window dynamic of the PCore model, as discussed in more detail in STN. Given that each neuron recorded has a slightly different such peak, this suggests that there are strongly parallel microcircuits through the BG system,
• The activity profiles of dSPN and iSPN neurons are essentially identical in terms of the average traces shown in the figure. However, statistically, the individual firing behavior of iSPN and dSPN neurons is decorrelated, consistent with the basic connectivity and overwhelming evidence supporting the fundamental Go vs. No distinction between these neurons.
• The participation of an individual SPN across different action types is consistent with a sparse distributed representations, with representational overlap matching action similarity.
Figure 14:
Recordings from neurons in the SNr BG output nuclei, which is strongly correlated with the X,Y head position of a mouse. Note the graded encoding of these continuous head position variables in terms of neural firing rate. Other neurons were correlated with other continuous motor variables (e.g., velocity), and anti-correlated as well as positively correlated cases (as shown) were present. From Barter et al., 2015.
Figure 14 provides a complementary picture of the activity of neurons in the BG output pathway of the SNr, in the dorsolateral motor area, from Barter et al. (2015). It clearly demonstrates that some of these neurons provide a graded firing rate signal that closely tracks continuous motor outcome variables such as head position, consistent with the idea that these neurons provide relatively direct mappings onto muscle groups that control different parts of the body.
Furthermore, at the level of these outputs, for these neurons, the brief phasic activity that is present in the striatal neurons is not evident, and instead there is a seemingly continuous modulation of neural activity. This could reflect the integration of many overlapping striatal neuron contributions across time, producing a continuous overall effect. Also see Basso & Wurtz (2002) for recordings of SNr neurons with eye movement correlates, which do not fire in advance of the corresponding motor areas in the colliculus, but do have a continuous activity modulation.
Taken together, this data is overall strongly consistent with a graded, dedicated-weight-gradient-search (DWGS) of the motor action space, which thus supports an efficient gradient-based search process through the space of such actions over the course of learning. The BG dorsal simulation explores how this learning can efficiently search through relatively high-dimensional action sequence space to learn arbitrary action sequences.
Thalamic modulation back to cortex
The strong version of the cortically focused model of action selection holds that the BG is critical for disinhibiting thalamocortical loops and thereby controlling action selection in the cortex (Albin et al., 1989; Chevalier & Deniau, 1990; Frank et al., 2001; Gurney et al., 2001; Frank, 2005; O’Reilly & Frank, 2006). However, recent simultaneous recordings from the GPi output nucleus and connected areas in the VLa nucleus of the thalamus showed little evidence of such a dynamic (Schwab et al., 2020), consistent with earlier lesion studies showing minimal effects of BG lesions on thalamic motor-related activity (Inase et al., 1996). Instead, GPi and thalamic activity typically moved in the same direction, and movement-related changes in GPi firing typically trailed those seen in the VLa itself. Various other measures showed little direct impact of GPi an thalamic activity.
In contrast to this apparent lack of influence of the BG on the thalamus, direct stimulation studies have shown robust inhibition of thalamus by stimulating the BG output nuclei, consistent with the monosynaptic inhibitory relationship between these neurons (Kim et al., 2017; Catanese & Jaeger, 2021). The most obvious reconciliation of these findings is that the motor task in the Schwab et al. (2020) study involved a simple overtrained reaching action, whereas we would expect the BG influence to occur during the initial learning of novel motor tasks, if indeed the main function of the BG is to support reinforcement learning driven by phasic dopamine changes.
There are a number of studies showing that indeed the influence of the BG over thalamus decreases with learning (Desmurget & Turner, 2010; Piron et al., 2016; Horak & Anderson, 1984), and other studies showing that the BG is only important for initial acquisition but not expression of learned motor actions (Neely et al., 2018; Koralek et al., 2013; Yin et al., 2004; Yin et al., 2009; Turner & Desmurget, 2010).
In summary, the main ascending contribution of the BG to cortical function is to drive learning, via its influence over the thalamus. This is a critical component of our overall PCore motor control model, as explored in the BG dorsal simulation.
The following sections describe additional elements of the BG system, which contribute important functionality, particularly for learning.
Ventral pallidum
Figure 15:
Connectivity of the ventral pallidum (VP), which has a diverse set of cell types with multiple different neurotransmitters as shown, and sends outputs to all the major motivational / affective / neuromodulatory brain areas. NAcc = nucleus accumbens (ventomedial striatum); BLA = basolateral amygdala; LH = lateral hypothalamus; VTA = ventral tegmental area (dopamine); MD = mediodorsal thalamus, which then projects extensively to ventral and medial prefrontal cortex (v/mPFC); LHb = lateral habenula; DR = dorsal raphe (serotonin). Figure from Root et al., 2015.
A major output pathway of the ventral and medial areas of the BG (i.e., ventromedial striatum, VMS) is the ventral pallidum (VP), which can be thought of as a ventral version of the globus pallidus. The VP sends outputs to all the major emotional brain areas (Figure 15 from Root et al., 2015), including the MD thalamus that projects extensively to the ventral and medial areas of the prefrontal cortex, the amygdala, dopamine (VTA), serotonin (dorsal raphe), and the hypothalamus (Kupchik & Prasad, 2021). The SNr output from VMS also projects to MD thalamus, and it is not clear how these pathways differ (Groenewegen et al., 1990)
Thus, the VP is the primary output pathway of the goal-related processing in the Rubicon framework, and these pathways are also critical for the ventromedial BG control over phasic dopamine firing (in the VTA and SNc), as captured in the PVLV model. A core mechanism of PVLV involves direct and indirect pathway neurons in VMS that can drive shunting inhibition and disinhibition of VTA dopamine neurons, while also driving phasic dips in dopamine firing via projections into the lateral habenula. The computational function of these projections is to compute a RPE (reward prediction error) relative to reward predictions learned by VMS neurons, as in the critic of a reinforcement learning model, consistent with recent VP data (Ottenheimer et al., 2020).
Given the fundamental importance of these ventromedial pathways for survival-relevant behavior, the resulting circuitry has been extensively shaped by millions of years of evolution, and is therefore quite a bit more complex and difficult to understand than the more regular, stereotyped structures in the dorsolateral BG and the neocortex, which derive more of their functionality through learning mechanisms within the individual’s lifetime. Nevertheless, it is clear that the same principles of opponent processing through the direct and indirect pathway apply to the VP (Kupchik & Kalivas, 2017; Root et al., 2015; Kupchik & Prasad, 2021), but the pathways and neurotransmitters involved are more complex.
One difference from the dorsolateral striatum is that ventromedial striatal neurons have no clear anatomical distinction between patch (striosomes, discussed next) and matrix, so these different functions are anatomically intermixed. We ascribe the critic RPE functionality to the patch neurons, while the goal-selection gating decision is associated with the matrix projections through to the MD thalamus, as discussed further in prefrontal cortex and PVLV.
Compartmental organization
The dorsal regions of the BG have a collection of important additional properties that support dedicated-weight-gradient-search in action selection and learning, particularly with respect to the credit assignment problem associated with parallel, graded activation of many different elements at the same time. Figure 6 shows that the BG is organized in terms of topographic, convergent input of different sensory-motor inputs to the dorsal striatum, with the SNr/GPi outputs integrating the contributions of many striatal Go vs. No votes to produce an overall graded motor control signal as discussed above.
Superimposed on this topography are additional types of neurons that coordinate learning across the individual SPN neurons that contribute to a common motor output signal. These neurons provide a credit assignment signal specialized to the actual contribution and activity of this motor pathway, in relation to phasic dopamine critic (reward prediction error) signals that ultimately arise from behavioral actions. This specialized credit assignment is essential for assigning more precise credit and blame, and thus greatly speeding the parallel search process relative to an alternative where only purely global, undifferentiated critic signals are applied to all neurons in proportion to their individual activity.
Figure 16:
Compartmental organization of dorsal BG, showing striosome patches vs. surrounding matrix, and the cholinergic interneurons (ChI, aka CINs) that are typically found on the borders of patches. Dopamine axons broadly innervate these regions. Figure from Brimblecombe & Cragg, 2017.
Figure 16 shows these elements, which include:
• Patches of striosomes that are clustered together in the dorsal striatum (Graybiel & Ragsdale, 1978; Gerfen, 1992), surrounded by the matrix neurons that have been the focus of the discussion so far. These patch neurons have distinct projections that provide bidirectional control over the firing of dopamine neurons in the SNc, which then project back up to these same anatomical regions (Stephenson-Jones et al., 2013). The connectivity of these patch neurons is similar to those in the ventromedial striatum via the ventral pallidum as discussed above, and provides a remarkable fit for the requirements of a localized critic system providing specialized dopamine signals based on the activity of the local group.
• CINs are cholinergic interneurons that receive strong inputs from the parafascicular (PF) feedback loops and drive volume conduction of acetylcholine to the surrounding patch and matrix cells, providing the primary mechanism for the integrated motor output signal to modulate learning and activity across a given functional region.
We discuss each of these components below, starting with a further consideration of the nature of the PF feedback signal that provides the source for the credit assignment computations.
Parafascicular feedback loops
As shown in Figure 6, the output signals from the BG in the SNr/GPi feed back into the striatum via the parafascicular (PF) nucleus of the thalamus (Foster et al., 2021; Fallon et al., 2023; Mandelbaum et al., 2019; Alloway et al., 2014). In primates, part of this circuit is called the centromedian (CM) nucleus, but we will use PF as a general term for this feedback connectivity. This pathway provides roughly as much total synaptic input to the striatum as the cortical inputs, so it is an important part of the overall BG circuit. In addition to getting significant input from the BG outputs, PF neurons receive inputs from associated areas of cortex and the superior colliculus (SC). Thus, the inhibitory BG output can modulate these excitatory inputs to provide a graded overall motor-control signal.
The functional implications of this pathway that are captured in the PCore dorsal striatum model include:
• PF neurons project directly to SPNs in the matrix, but critically not to the striosomes, and synapse on distal regions of the dendrite, targeting shafts instead of spines (Smith et al., 2004), with a high ratio of NMDA to AMPA receptors (Ellender et al., 2013). These properties, along with their broad connectivity pattern, suggest a modulatory role on SPNs, potentially also allowing salient sensory inputs from the SC to upregulate relevant motor areas.
• PF provides the major source of glutamatergic input to the CINs, which is the main function of the PF pathway in our model, because the CINs can provide a modulatory signal to both the patch and matrix SPNs, unlike the PF projections.
• PF projections into DLS specifically activate inhibitory interneurons in the matrix, potentially providing an additional termination inhibition that keeps the SPN firing restricted to a narrow time window.
CINs
The cholinergic interneurons (CINs) were previously called the tonically active neurons (TANs) due to the fact that they fired continuously, in contrast to the generally quiescent SPNs. As discussed in Rubicon and PVLV, the acetylcholine (ACh) system is critical for modulating the excitability and learning of many brain areas, including the BG via the CINs. Also, as noted above, the CINs receive significant input from the parafascicular feedback loops conveying BG SNr/GPi output back into the striatum.
CINs are spaced fairly widely and preferentially located at the border between the striosome and matrix cells, where they are thought to provide a bridge between these two sets of striatal neurons (Gonzales & Smith, 2015; Prager & Plotkin, 2019). They are most concentrated in dorsal areas, and are relatively rare in the ventral areas (Abudukeyoumu et al., 2019), consistent with the idea that they are most important for modulating the parallel motor learning process. In ventral goal-learning areas, the centralized LDT ACh signal is more important, as discussed in acetylcholine.
Because CINs are tonically active, their phasic influence is through a stereotypic pattern of brief activity followed by a relatively long pause, and then rebound. This pattern facilitates synaptic plasticity in SPNs (Nair et al., 2015; Doig et al., 2014; Goldberg & Reynolds, 2011; Crittenden et al., 2017), and can directly stimulate dopamine release as well (Abudukeyoumu et al., 2019; Goldberg & Reynolds, 2011). Thus, CINs are ideally situated to provide a credit-assignment modulatory signal for learning in the striosomes (and matrix), based on feedback signals from the PF and other inputs.
Striosomes
Figure 17:
Direct and indirect pathway outputs from the striosomes versus those from the matrix. Direct pathway projections inhibit dopamine neurons in the SNc, while indirect pathway neurons project to a region of the globus pallidus that has excitatory projections to the lateral habenula (LHb), which is exclusively capable of driving dips in dopamine firing. Figure from Grillner et al., 2020.
Figure 18:
The globus pallidus habenula (GPh) pathway involves projections from striosomes to a subset of GPi / entopeduncular (EP) neurons that express glutamate in addition to GABA, and project to the lateral habenula. This is a critical pathway by which the striosomes can modulate dopamine firing. Figure from Stephenson-Jones et al., 2013.
The patch neurons in striosomes have both direct and indirect types (Figure 17), with the direct pathway projecting to the SNc dopamine neurons that send dopamine back up to the striatum (which they directly inhibit; Evans et al., 2020; Nadel et al., 2021; Okunomiya et al., 2025; Dong et al., 2025), and the indirect pathway projecting to the lateral habenula via the GPh (habenula-projecting globus pallidus; Figure 18; Stephenson-Jones et al., 2013; Wallace et al., 2017). The lateral habenula (LHb) is exclusively capable of driving phasic dips in dopamine firing (see PVLV). Thus, these neurons are in a position to regulate the dopamine neuromodulation of the striatum on a relatively topographically organized basis (Joel & Weiner, 2000).
Overall, the striosomal input and output connectivity is similar to that of neurons in the ventromedial striatum, with strong similarities to the ventral pallidum outputs as discussed above. The striosomes, even those located in dorsolateral motor areas, receive input preferentially from the ventral and medial goal-driven brain areas (PL, IL, ACC) and not from motor cortex (Berendse et al., 1992; Gerfen, 1989). Thus, as noted above, the striosomes are ideally configured to provide a localized critic signal for a functional region of BG.
We hypothesize that the cholinergic input from the CINs, driven by PF feedback inputs, provides a signal to specialize the critic learning in a given striosome based on the overall BG output. Specifically, as PF activity occurs during performance of a motor action sequence, the striosomes accumulate a synaptic trace of pre * post synaptic activity modulated (multiplied) by the PF signal as filtered through the CIN activity. This synaptic trace is then modulated by the final outcome-time dopamine RPE signal to determine which direction the synaptic weights change (see below for specific equations).
Figure 19:
Direct (D1) pathway activity in patch neurons for the correct vs incorrect (error) action pools in the BG dorsal sequence learning model. Relatively quickly, these patch neurons learn to differentiate correct vs. incorrect actions on average (Correct > Error), providing a useful additional pool-specific critic modulation.)
The net result is that striosomes represent the expected reward associated with a PF output pathway being activated, as a function of the goal context input into the striosomes. The direct pathway striosomes encode the positive reward association (increased with dopamine bursts), while the indirect pathway neurons encode the negative associations (increased with dopamine dips). Figure 19 shows that the direct pathway (D1) patch neurons quickly learn that there is a greater reward expectation for the correct PF output pools relative to the incorrect ones. The converse pattern holds for the D2 indirect neurons as well.
This local critic signal then shapes learning in the surrounding matrix neurons via projections to the SNc dopamine areas, by modulating the online synaptic trace value that is then driven up or down by the outcome-based global phasic dopamine signal. Thus, patch-driven local SNc dopamine happens online during the execution of a motor sequence, while outcome-driven global SNc drives the final synaptic weight changes, at the end of a goal-engaged window. The logic that works the best and makes sense computationally also includes a contingency on the PF activity, as follows:
• If the PF indicates that this pool of SPNs was active in shaping the BG output, then the synaptic trace is discounted (reduced) by the positive reward association encoded by direct pathway patch neurons, which drive shunting inhibition onto the SNc. This implements the basic error-driven learning principle of discounting expected success, which prevents overtraining and minimizes interference on other inputs. Interestingly, the indirect pathway patch activity adds to the trace, because if the final outcome is negative, then the negative expectation signalled by these indirect patch neurons suggests they should receive extra blame.
• If the PF indicates that this output pathway was not active, then in principle the matrix SPNs had no impact on the final outcome (for this point in the sequence). However, they can still usefully learn in the opposite direction of the above logic. Specifically, if the patch-based signal indicates that a given region has a net-positive reward expectation at the given point in time, and yet it was not sufficiently activated to drive PF-level outputs, then instead of discounting its contribution, learning in this pathway should be amplified. However, this amplification should only occur if there was a negative overall outcome signal (a dopamine dip), so the sign of this trace contribution is reversed.
The complex dynamical properties of CINs, along with their various other inputs, could potentially support an additional filtering and engagement of these feedback signals, to further focus the patch-based localized critic learning. For example, the SC is a source of stimulus novelty signals, which can modulate BG trace learning for actions taken around novel stimuli.
PCore learning: trace and credit assignment
With the above understanding of the functional role and dynamics of the BG circuit, we can now introduce the learning mechanisms used in the PCore model. Even though the dorsolateral and ventromedial parts of the BG have different functions and modes of activity, they have essentially the same learning rules in our model, although the functional implications are somewhat different. Currently, learning only takes place in the striatal neurons, but future work will explore learning mechanisms in other parts of the circuit, which also receive dopamine innervation and have different dopamine receptors.
Ventromedial learning
As is evident in the neural activity data from VMS neurons shown in Figure 12, there is a significant temporal gap between the initial goal-selection decision to engage in the task and activate the overall plan required to succeed, and the time when the reward outcome occurs. As discussed further in the PVLV model, learning must bridge this gap, so that the outcome can properly shape the goal-selection decision next time around. This temporal credit assignment problem is solved by way of a synaptic tag signal that is encoded at each synapse at the time of goal-selection gating, and is then modulated by the dopamine signal at the time of the outcome to drive the actual weight change. In this way, the tag functions as an eligibility trace, which has also been used in the TD learning rule (reinforcement learning).
The trace / tag component of the learning includes two additive factors:
• A simple delta-rule error term defined over the minus vs. plus phase receiving activations, which allows the striatal neurons to learn from the cortical temporal derivative error gradients as they project down to the striatum; see GeneRec and kinase algorithm for more details.
• A standard Hebbian learning-like synaptic activity factor that reflects the sending and receiving contributions to synaptic calcium influx, via the NMDA receptor, as discussed in synaptic plasticity.
Both of these factors are modulated by the current acetylcholine neuromodulatory level, which provides a widespread goal gating modulation based on the presence of important outcome-associated stimuli (CSs and USs), or novel stimuli, as explained in PVLV:
\[ Tr = \rm{ACh} \left[ x (\rm{CaP} - \rm{CaD}) + \gamma x \rm{CaD} \right] \]
The \(x\) represents the sending neuron activity (using CaD as a longer time-average integration of that value), and \(\gamma = 0.6\) is a weighting factor for the Hebbian term relative to the delta factor.
The weight change computed at the time of the outcome is then the dopamine modulation DA (which is positive for bursts and negative for dips) times the accumulated trace values:
\[ \Delta W = \rm{DA} \sum Tr \]
With the definition of the trace in eq_vs-tr, this is partially a three factor learning rule, driven by the dopamine RPE signal, sending activity, and receiving activity, which is very widely used in models of BG learning. However, this learning rule adds the additional ACh modulation in addition to dopamine, and it includes a minus-plus error gradient factor.
Dorsolateral learning
The DLS learning rule has the same weight change equation (Eq 2) but the trace component captures the effects of the patch-driven online modulation as described above, in addition to the same combination of delta-rule and synaptic activity factor as used in VMS (Eq 1):
\[ Tr_a = x (\rm{CaP} - \rm{CaD}) + \rm{ACh}_{cin} (1-p) \gamma x \rm{CaD} \]
where the ACh modulation now comes from the CINs, which in turn are driven by the PF output activity and its other inputs, and it only applies to the synaptic activity factor, not the delta rule here. The patch-based contribution is weighted by a factor \(p\) (0.5 default), and has two forms depending on whether the PF / CIN activity is above threshold:
\[ Tr_{pf} = \rm{ACh}_{cin} p \gamma \left[ (1 - P_{d1}) + P_{d2} \right] x \rm{CaD} \]
or is effectively off:
\[ Tr_{off} = o p \gamma \left[ P_{d2} - P_{d1} \right] x \rm{CaD} \]
where \(o\) is the scaling factor for this weaker off factor (0.1 default), and \(P_{d1}\) is the direct (D1) pathway patch activity, and \(P_{d2}\) is the D2 indirect pathway activity (these factors are both normalized based on average activity in the corresponding pools of patch activities).
Figure 20:
Direct (D1) vs indirect (D2) pathway activity in patch neurons for the active pool in the BG dorsal sequence learning model, over the trials in a single sequence. The system has been getting the first and last actions in the sequence wrong, and this is reflected in the learned patch values as shown (D1 is strong for correct trials, weaker for incorrect, while the opposite holds for D2). The active learning rule will discount learning on the correct actions due to the stronger D1 activity, while driving more learning on the incorrect first and last trials.
The active factor (Eq 4) captures the shunting inhibition effect of the direct pathway inhibition onto the SNc dopamine neurons via the \((1-P_{d1})\) term, which discounts learning to the extent that a pool is already expected to be correct (Figure 20). The indirect pathway modulation of LHb DA via the D2 factor adds positively to the trace so that a negative expectation will drive synaptic decreases if the output is negative (and it will tend to be small otherwise).
The off factor (Eq 5) does not do discounting but rather drives a net DA signal between both pathways reflecting the difference of D1 - D2 with a minus sign applied, which drives exploratory learning toward pools that were not engaged this time around, but still had a higher positive reward association, which will end up being positive if the end result is worse than expected (because the wrong actions were in fact selected).
This logic is somewhat complex, but every part of the above equations has been extensively tested in the BG dorsal simulation, where the addition of these patch-based modulatory signals makes a significant contribution to being able to learn longer action sequences with more possible actions.
Summary
Befitting its role at the heart of the brain, with its fingers on every major button, the basal ganglia is a complex system with many moving parts, and a considerable degree of functional specialization across different areas, from ventromedial to dorsolateral. Nevertheless, each of these different areas shares the same core circuit, and we have found that the same “core” PCore model performs well for the different computational functions associated with these different areas.
As illustrated by the characteristic response properties of neurons in the ventromedial striatum (VMS) (Figure 12), this area of the BG circuit is critical for the goal selection process (as described by the Rubicon model) and for processing the subsequent outcome of the goal-engaged episode. Interestingly, this goal-selection process performs much of the work involved in action selection, in a proactive manner, so that the motor-related processing in the DLS can focus on a more dynamic, online modulation of moment-by-moment action execution within the context of a larger plan.
There is also evidence that different species have different balances of BG vs. cortical influence on motor control, with primates and humans having a much stronger degree of cortical influence over motor control, while rodents and other species with less cortical development are more driven by the BG. Thus, the impact of the BG on prefrontal cortex is likely to be more important for primates, and that page is strongly recommended to get a more complete picture of the broader BG / PFC system.
Simulations
• BG ventral simulation is the simpler of the two basic BG models, and is the best way to understand the basic dynamics of the PCore model. It simulates the Rubicon goal-selection process in ventromedial striatum and associated BG circuits through the ventral pallidum, based on cortical input indicating the relative costs vs benefits of selecting a given goal.
• BG dorsal simulation explores the dynamic motor control process supported by the dorsolateral striatum and associated BG circuits, in a simple motor sequencing task where the correct motor sequence is rewarded and others are not.