Jack Lindsey Profile
Jack Lindsey

@Jack_W_Lindsey

Followers
1,398
Following
213
Media
45
Statuses
202

Interested in understanding neural networks (all kinds!) -- @AnthropicAI . Previously @cu_neurotheory .

Joined January 2019
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@Jack_W_Lindsey
Jack Lindsey
19 days
Recently at @CogCompNeuro I led a tutorial on sparse autoencoders for LLM interpretability. For anyone interested, here's the link!
0
45
340
@Jack_W_Lindsey
Jack Lindsey
4 years
New preprint! We explore a meta-learning approach to achieving biologically plausible learning in neural networks, using feedback and local plasticity. In our experiments we match (in some cases outperform) gradient descent-based learning. . Thread: [1/n]
4
61
258
@Jack_W_Lindsey
Jack Lindsey
8 months
What kinds of representations do neural networks learn? In this paper w/ @allemanjm and @StefanoFusi2 , to appear at #ICLR2024 , we find & explain effects of task structure and choice of nonlinearity on learned representational geometry . Thread:
2
39
213
@Jack_W_Lindsey
Jack Lindsey
4 months
I joined the mechanistic interpretability team at Anthropic recently, and it's been really exciting. To any comp. neuroscience followers -- I highly recommend following this literature, or getting involved yourself! You can keep up with our research here:
5
15
205
@Jack_W_Lindsey
Jack Lindsey
2 years
Midbrain dopamine (DA) activity is thought to enable reinforcement learning (RL) by signaling reward prediction error (RPE). But DA also signals movement! In this new work we show how movement-related DA activity can fit into the RL picture. (1/13)
1
34
179
@Jack_W_Lindsey
Jack Lindsey
6 months
Excited to share the capstone work of my PhD, w/ @vulcnethologist @Datta_Lab and Ashok Litwin-Kumar. We think it’s an exciting step in understanding how the brain implements reinforcement learning, at a neuronal and algorithmic level.
3
27
130
@Jack_W_Lindsey
Jack Lindsey
4 months
Really excited about this work from our team! Personally, I am amazed by some of the findings. The kinds of abstractions you can find represented inside this model are quite deep, and often surprising. I expect this is just the beginning for large-scale interpretability!
@AnthropicAI
Anthropic
4 months
New Anthropic research paper: Scaling Monosemanticity. The first ever detailed look inside a leading large language model. Read the blog post here:
Tweet media one
68
573
2K
4
12
126
@Jack_W_Lindsey
Jack Lindsey
2 years
New preprint! Why do learning and memory often involve “consolidation” between different systems in the brain? We develop a general model of systems consolidation and analyze its computational properties, esp. as compared to synaptic consolidation. (1/21)
1
25
101
@Jack_W_Lindsey
Jack Lindsey
4 years
How does neural connectivity support learning and memory? This (very long) paper provides insights, diving deep into the connectome of the fly mushroom body. A huge team effort that I was fortunate to be part of. Here are some of my favorite parts (1/22)
2
31
91
@Jack_W_Lindsey
Jack Lindsey
1 year
New preprint with Elias Issa! We show evidence that the primate visual cortex, and neural network models that most resemble it, factorize different types of information about visual scenes into orthogonal subspaces: Thoughts welcome!
2
17
85
@Jack_W_Lindsey
Jack Lindsey
4 years
What makes convolutional neural networks brain-like? We’ve identified new properties besides classification performance, like factorization of scene parameters, that predict resemblance to visual cortex data. Check out our #COSYNE2021 poster 1-048 (w/ Elias Issa) on Wednesday!
1
5
56
@Jack_W_Lindsey
Jack Lindsey
2 years
Check out our poster (I-031) tomorrow @CosyneMeeting ! "Cortical dopamine enables deep reinforcement learning and leverages dopaminergic heterogeneity" We propose a role for dopamine in representation learning in RL. Would love to discuss with anyone interested! #COSYNE2023
1
4
31
@Jack_W_Lindsey
Jack Lindsey
2 years
I'll be at NeurIPS this week presenting on biological models of off-policy RL (), and giving a brief talk on models of systems memory consolidation at the MemARI workshop (). DM me if you want to chat about RL, memory, or neuro-AI!
0
4
29
@Jack_W_Lindsey
Jack Lindsey
1 year
Part 2 of our work (led by Kevin) on the role of motor cortex + basal ganglia in "flexible" vs. "automatic" tasks! Esp. excited about a phenomenon (which we model!) where subcortical consolidation of automatic behaviors is prevented by simultaneously practicing flexible behaviors
@KevinMizes
Kevin Mizes
1 year
Ever wonder how the pianist can perform a new piece from sheet music, or effortlessly play a well-practiced sonata in a concert? We study the underlying neural circuits in ‘piano-playing’ rats and show that motor cortex and basal ganglia play different roles.
4
23
128
0
3
22
@Jack_W_Lindsey
Jack Lindsey
5 years
Excited to give a talk tomorrow at #ICLR2019 on using convolutional neural networks to understand animal visual processing! Title: A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained Deep CNNs. Link:
1
3
21
@Jack_W_Lindsey
Jack Lindsey
6 months
For anyone at @CosyneMeeting , I'm presenting new work on how RL in the basal ganglia Friday (02-087). I'm excited about this work, which clarifies some puzzling empirical findings and suggests a new understanding of the learning algorithm used by the BG. Tweetprint coming soon!
0
2
19
@Jack_W_Lindsey
Jack Lindsey
3 years
A paper I contributed to in undergrad, exploring modular error-correcting dynamics in mouse ALM, is now out! Congrats to lead authors Guang Chen and Byungwoo Kang, and PIs Nuo Li and Shaul Druckmann ( @ShaulDr ). My favorite parts are (1/2)
2
3
16
@Jack_W_Lindsey
Jack Lindsey
2 years
New fiscal year, new fiscal me.
2
1
16
@Jack_W_Lindsey
Jack Lindsey
10 months
Check out our new work on the theory of multi-task learning and pretraining+finetuning!
@SamuelLippl
Samuel Lippl
10 months
What are the consequences of training networks on multiple tasks? @Jack_W_Lindsey and I give a theoretical description of the biases of pretraining + finetuning and multi-task learning. There are surprising findings with practical implications! (1/15)
2
25
90
0
0
15
@Jack_W_Lindsey
Jack Lindsey
4 years
This work builds a lot on ideas and methods from @KhurramJaved_96 + Martha White, @jeffclune , @thomasmiconi , @chelseafinn . And might be interesting to others recently working on bio-plausible credit assignment, including @tyrell_turing , @kordinglab , @NeuroAILab . [Bonus/14]
1
2
13
@Jack_W_Lindsey
Jack Lindsey
4 years
I'll be presenting this work at NeurIPS tomorrow (Tuesday) at virtual poster session 2, 12 - 2pm. Stop by if you're interested or have questions! Paper: GitHub:
@Jack_W_Lindsey
Jack Lindsey
4 years
New preprint! We explore a meta-learning approach to achieving biologically plausible learning in neural networks, using feedback and local plasticity. In our experiments we match (in some cases outperform) gradient descent-based learning. . Thread: [1/n]
4
61
258
0
2
12
@Jack_W_Lindsey
Jack Lindsey
4 years
Leveraging advances in meta-learning, we optimize neural networks for the task of learning itself, subject to biological constraints. These networks learn by making predictions, propagating error info through feedback weights, and performing Hebbian-style weight updates. [4/n]
Tweet media one
1
0
11
@Jack_W_Lindsey
Jack Lindsey
6 years
Upcoming ICLR paper!
@StphTphsn1
Stephane Deny
6 years
Why are receptive fields circular in the retina but sharply oriented in primary visual cortex (V1)? Answers in our new #ICLR2019 paper using a deep convolutional model of the visual system: . With @Jack_W_Lindsey , @SamOcko and @SuryaGanguli . ⬇ THREAD ⬇
Tweet media one
8
94
266
0
2
9
@Jack_W_Lindsey
Jack Lindsey
2 years
New work led by @KevinMizes on the role of striatum in flexible vs. automatic movements!
@KevinMizes
Kevin Mizes
2 years
Thrilled to share my graduate work with PI @BOlveczky and co-authors @Jack_W_Lindsey & @seanescola : 'piano playing' rats! tl;dr - striatum is needed to produce automatic but not flexible sequences, but it codes for & controls low-level kinematics for both
5
40
157
0
2
9
@Jack_W_Lindsey
Jack Lindsey
4 years
Finding 5: Though FLP networks with plasticity in earlier layers perform better following learning, they have less useful features (as measured by ability to linearly readout task targets) at initialization, suggesting a tradeoff between adaptability and "innate" skill [11/n]
Tweet media one
1
0
8
@Jack_W_Lindsey
Jack Lindsey
4 years
And also some limitations. Scaling up the meta-learning approaches to long “lifetimes” will be difficult. And there are still gaps to be closed before we can claim full “biological plausibility” (let alone “actually happening in the brain”) [13/n]
2
0
8
@Jack_W_Lindsey
Jack Lindsey
4 years
Some thoughts: (1) there may be many bio-plausible learning algorithms that are effective, but hard to think of without directly meta-optimizing for them, and (2) some aspects of neural circuits that are hard to square with backprop may be features, not bugs! [14/14]
2
0
8
@Jack_W_Lindsey
Jack Lindsey
4 years
Deep nets are powerful, but does the way they learn relate to the brain? Lots of great recent work has proposed more biologically plausible approximations to backprop, e.g. using local circuit mechanisms that enforce symmetry between feedforward and feedback pathways. [2/n]
1
0
7
@Jack_W_Lindsey
Jack Lindsey
4 years
Here we pursue another approach. Backtrop-trained deep networks have well-known shortcomings. They learn slowly, iterating over large datasets. Humans are capable of more rapid, online learning. Could deviating from backprop help us replicate these abilities in neural nets? [3/n]
1
0
6
@Jack_W_Lindsey
Jack Lindsey
8 months
We’d love to extend these analyses to more realistic tasks and understand the behavior of deep networks better. Let us know if you have any comments or ideas!
2
0
6
@Jack_W_Lindsey
Jack Lindsey
4 years
and @_Nils_Otto_ , Lisa Marin, @gsxej , @ScottishWaddell , and many other wonderful collaborators for being great to work with + getting me up to speed (at least a little) on fly neuroscience, and of course @janeliaflyEM for this awesome data. (22/22)
0
0
6
@Jack_W_Lindsey
Jack Lindsey
4 years
Finding 3: On continual learning tasks, where the data distribution changes over time, FLP networks do even better in our experiments than their gradient-based counterparts! [8/n]
1
0
6
@Jack_W_Lindsey
Jack Lindsey
2 years
What is this term? We call it “action surprise.” It measures the deviation of the action just performed by the action performed by the agent/animal and the action that would be performed by the reinforcement learner (i.e. the BG) alone absent influence by other regions. (7/13)
Tweet media one
1
0
6
@Jack_W_Lindsey
Jack Lindsey
2 years
Thanks to my advisor Ashok Litwin-Kumar for his many contributions and guidance throughout this project, and for his patience as we cycled through crazy ideas #1 through #(N-1) before arriving at this one :). (13/13)
1
0
6
@Jack_W_Lindsey
Jack Lindsey
6 months
Exactly as the model predicts, we found action-specific SPN subpopulations in which “difference mode” activity (dSPN - iSPN gap) rises prior to action onset but “sum mode” activity (dSPN + iSPN total) rises following action onset.
Tweet media one
1
0
2
@Jack_W_Lindsey
Jack Lindsey
4 years
There is lots to explore here. Our experiments used feedforward networks with fixed, one-layer feedback pathways. One can imagine recurrent networks with plastic, multilayer feedback pathways! [12/n]
1
0
5
@Jack_W_Lindsey
Jack Lindsey
4 years
Finding 1: These “FLP” networks learn the benchmark tasks. Through ablations, we find that their use of feedback is important for their learning — that is, they are in fact performing “credit assignment” to upstream layers. [6/n]
1
0
5
@Jack_W_Lindsey
Jack Lindsey
4 years
Highlight #1 : PN->KC connectivity has previously been modeled as random, an organization which (via expansion of representation dimensionality) supports discrimination between many sensory stimuli. The data, however, reveals significant deviations from randomness… (7/22)
Tweet media one
1
0
5
@Jack_W_Lindsey
Jack Lindsey
4 years
…in the success of FLP networks in continual learning. We observe that FLP network updates interfere less with the network’s behavior on previously learned tasks than gradient-based updates do [10/n].
Tweet media one
1
0
5
@Jack_W_Lindsey
Jack Lindsey
8 months
In particular, we wanted to understand hidden reps in two dimensions: how similar are they to the training targets (c.f. “neural collapse”), and how similar are they to the input patterns? (In the paper we also consider the consequences for disentanglement.)
1
1
5
@Jack_W_Lindsey
Jack Lindsey
8 months
Why does this happen? We find it’s the asymmetric saturation of the ReLU function. In terms of the target alignment as a function of input-target alignment (i.e. separability), we see that changing the saturation behavior roughly makes ReLU networks act like tanh networks.
Tweet media one
1
0
3
@Jack_W_Lindsey
Jack Lindsey
4 years
@ericjang11 @StephaneDeny @niru_m My guess is that some additional bells and whistles in the optimization, a move to gradient-free (e.g. evolutionary) approaches, and/or some kind of curriculum learning during meta-training will be necessary. But I think / hope these are solvable problems!
1
0
5
@Jack_W_Lindsey
Jack Lindsey
4 years
Across lifetimes, we meta-optimize the initializations of the feedforward and feedback weights (green parts of prev. figure). We use standard meta-learning benchmarks in ML — nonlinear regression, and few-shot image classification — the net faces a new task each lifetime. [5/n]
1
0
5
@Jack_W_Lindsey
Jack Lindsey
11 months
Run it back
@Jack_W_Lindsey
Jack Lindsey
2 years
New fiscal year, new fiscal me.
2
1
16
0
0
5
@Jack_W_Lindsey
Jack Lindsey
2 years
All this was done with my advisor Ashok Litwin-Kumar, who in addition to being generally awesome, displayed a remarkable tolerance for mission creep as this project transformed from humble beginnings as a model of two Drosophila neurons into [whatever it is now]. (21/21).
1
0
4
@Jack_W_Lindsey
Jack Lindsey
4 years
Finding 2: On i.i.d. learning tasks, where data is drawn randomly from a fixed distribution, FLP networks match comparable gradient-based learners (which also have meta-learned initializations, for fair comparison). [7/n]
1
0
4
@Jack_W_Lindsey
Jack Lindsey
8 months
We find that one-layer Tanh network representations inherit the geometry of the target outputseven when the output is barely separable, while one-layer ReLU networks tend to retain more of the structure of the inputs.
Tweet media one
1
0
4
@Jack_W_Lindsey
Jack Lindsey
2 years
We propose a general hypothesis: that not all experiences are equally worth learning about, and a primary function of the “short-term” learning system is to determine which memories should be consolidated and gate plasticity in the long-term system accordingly. (4/21)
1
0
4
@Jack_W_Lindsey
Jack Lindsey
8 months
We can better understand this phenomenon by plotting the hidden layer weights on top of their average gradients, finding that the asymmetry of the ReLU nonlinearity causes some neurons to gain target-orthogonal selectivity depending on their initialization.
Tweet media one
1
0
4
@Jack_W_Lindsey
Jack Lindsey
2 years
including coarse (as opposed to high-dimensional) movement signaling, DA tuning for movement onset, correlation of DA activity with kinematics, and waning movement signaling with task practice (10/13)
Tweet media one
1
0
4
@Jack_W_Lindsey
Jack Lindsey
3 years
@RobertRosenba14 @KordingLab for learning that takes place over long timescales (e.g. storing memories for later use) that path you take in parameter space may look nothing like gradient descent, even if the loss decreases by the end -- and the "non-gradient" components may be essential for final performance
0
0
4
@Jack_W_Lindsey
Jack Lindsey
6 years
Our work presented at #cosyne19 on understanding properties of early stage visual processing
@StphTphsn1
Stephane Deny
6 years
Two #cosyne19 posters elucidating early visual system hallmark properties, w/ @SamOcko @Jack_W_Lindsey & @SuryaGanguli : 1) Why are receptive fields concentric in the retina and oriented in primary visual cortex? We address this question with an anatomically constrained deep CNN.
Tweet media one
1
10
38
0
1
4
@Jack_W_Lindsey
Jack Lindsey
4 years
Finding 4: FLP networks learn in qualitatively different fashion from gradient-based learners. Their updates are weakly (sometimes negatively!) correlated with direction of a gradient-based update. These differences presumably play a role… [9/n]
Tweet media one
1
0
4
@Jack_W_Lindsey
Jack Lindsey
2 years
We intend our work to inspire further work on biologically inspired off-policy RL algorithms, and to motivate new experimental study of the role of movement-related DA in learning that can support, challenge, or suggest extensions to our model (12/13).
1
1
4
@Jack_W_Lindsey
Jack Lindsey
3 years
@KordingLab @itamarlandau @TonyZador @tyrell_turing @PessoaBrain @KordingLab Potential concerns (also sorry if jumping in randomly violates Twitter etiquette!): E[perturb + select] only equals gradient if (1) perturbations are sufficiently small rel. to curvature of loss landscape, and (2) "select" happens fast enough rel. to "perturb"...
2
0
4
@Jack_W_Lindsey
Jack Lindsey
8 months
A word on the consequences of target alignment: when targets are low-dimensional, target-aligned representations can be thought of as “disentangled” or “abstract” representations of the output coordinates, which discard other information about the inputs
1
0
4
@Jack_W_Lindsey
Jack Lindsey
4 years
Highlight #4 : DANs receive extensive modulation from MBONs, both direct and multisynaptic, within and across compartments. Broadly, this suggests a “critic” role for MBONs in contributing to learning signals, beyond their classical “actor” role in driving behaviors. (14/22)
Tweet media one
Tweet media two
1
0
3
@Jack_W_Lindsey
Jack Lindsey
4 years
…most strikingly at the level of KC subtypes specializing for different sensory modalities. We suggest this architecture reflects a prior that the association between multimodal stimuli and their valences is factorizable across modalities (no olfactory-visual XORs). (8/22)
Tweet media one
Tweet media two
1
1
3
@Jack_W_Lindsey
Jack Lindsey
4 years
@ericjang11 @StephaneDeny @niru_m This is a great post! I very much agree with the approach described. But I do think executing on this idea may not be so straightforward to do at scale. Backprop (naively implemented) doesn't seem great at handling long-horizon meta-learning problems.
1
0
3
@Jack_W_Lindsey
Jack Lindsey
4 years
@ampanmdagaba Not yet, hopefully getting there soon!
0
0
3
@Jack_W_Lindsey
Jack Lindsey
4 years
Mathy interpretation: this organization allows high-rank updates to the KC-MBON weight matrix, rather than the rank-1 updates given by Hebbian-style (or ML-style) learning rules. Modeling suggests that this flexibility can be beneficial given a factorizable task structure (10/22)
Tweet media one
1
0
3
@Jack_W_Lindsey
Jack Lindsey
6 months
One way to think about this in population activity space. Action selection is driven by the dSPN - iSPN “difference modes,” but learning is governed by activity in dSPN + iSPN “sum modes.” The efference input excites the sum modes without interfering with the difference modes.
Tweet media one
1
0
3
@Jack_W_Lindsey
Jack Lindsey
6 years
See our recent NeurIPS paper!
@StphTphsn1
Stephane Deny
6 years
1/ Why are there so many *types* of ganglion cells transmitting visual information from the retina to the brain? In this joint work with @SamOcko , Jack Lindsey and @SuryaGanguli , we tried to answer this question through the lens of efficient coding:
Tweet media one
1
28
77
0
0
2
@Jack_W_Lindsey
Jack Lindsey
6 months
Actually, it doesn’t make sense! The iSPN plasticity rule is *backward.* It reinforces patterns of activity that led to negative outcomes, causing the striatum to *further* suppress actions that were *already being suppressed* when the negative outcome occurred.
Tweet media one
1
0
3
@Jack_W_Lindsey
Jack Lindsey
6 months
For this to work, the efference input needs to be strong relative to the non-efference input. That is, we predict that *most* (but not all) observed SPN activity is not causally influencing action selection, but rather passively encoding it to enable learning.
1
0
1
@Jack_W_Lindsey
Jack Lindsey
5 months
@Neuro_Skeptic I think the burgeoning field of mechanistic interpretability of LLMs / other deep learning models (which I now work in!) is a relevant case study (caveat: of course these models are different than brains). It's hard but there has been a lot of progress in recent years!
0
0
3
@Jack_W_Lindsey
Jack Lindsey
2 years
We show that an approximate form of continuous Q-learning can be implemented in a tractable, biologically plausible fashion by adding a movement-related term to the RPE term present in classic on-policy actor-critic models of the basal ganglia. (6/13)
1
0
3
@Jack_W_Lindsey
Jack Lindsey
4 years
@blake_camp_1 @KordingLab @tyrell_turing @TonyZador @neuro_data @bradpwyble You might find this review interesting: (Though the cell-intrinsic parameters it focuses on might be different from those you have in mind)
1
0
3
@Jack_W_Lindsey
Jack Lindsey
8 months
We also analyze deeper networks on nonlinearly separable tasks, parametrically varying a measure of task difficulty. We find (1) target alignment increases across layers (2) Tanh learns more target-aligned representations than ReLU, especially as task difficulty increases
Tweet media one
1
0
3
@Jack_W_Lindsey
Jack Lindsey
8 months
This phenomenon holds across a broad family of tasks with different input and output geometries, where we control the task structure by sampling tasks with a specified alignment of input and output kernels), and for convolutional networks trained on CIFAR / STL.
Tweet media one
1
0
3
@Jack_W_Lindsey
Jack Lindsey
4 years
@tyrell_turing @blake_camp_1 @risi1979 @enasmel @neurograce @TonyZador @kenneth0stanley @jeffclune @asoltoggio @ThomasMiconi @KordingLab @NeuroAILab @hardmaru In the spirit of "networks do what they are trained to do," I feel like meta-learning for "general purpose" learning -- where task distribution is vast + diverse -- is basically unexplored (to my knowledge) and it's hard to say whether it can work or not.
4
0
3
@Jack_W_Lindsey
Jack Lindsey
8 months
We first train one-hidden-layer networks and track how much the geometry of the hidden layer activations resembles those of the inputs vs. the labels by measuring kernel alignment
1
0
3
@Jack_W_Lindsey
Jack Lindsey
8 months
We first trained on binary classification tasks, controlling task structure by the linear separability of the output variable. When the output is not linearly separable in the input space, single-layer networks cannot extract a representation that is aligned with the target
Tweet media one
1
0
3
@Jack_W_Lindsey
Jack Lindsey
3 years
@RobertRosenba14 @KordingLab Thanks for the shout out! Yeah I support maintaining a bit of gradient skepticism :). Even in a single-objective setting, the only requirement is that the loss function needs to decrease. If the loss is very bumpy, good parameter updates may be unrelated to gradients. And…
1
0
3
@Jack_W_Lindsey
Jack Lindsey
4 years
@blake_camp_1 @tyrell_turing @risi1979 @enasmel @neurograce @TonyZador @kenneth0stanley @jeffclune @asoltoggio @ThomasMiconi @KordingLab @NeuroAILab @hardmaru Yeah could be. Hard to say without making a serious effort though. @niru_m and @Luke_Metz seem to have made some progress on making these kinds of problems more tractable (). Obviously not closed to a solved problem though
0
0
3
@Jack_W_Lindsey
Jack Lindsey
6 months
How does the brain circumvent this issue? Here's our idea: we’ve seen that SPN activity prior to action selection doesn’t mesh with the SPN plasticity rules. But what if there's another source of SPN activity with different structure, that occurs shortly *after* action selection?
1
0
2
@Jack_W_Lindsey
Jack Lindsey
8 months
The learning behavior is strongly modulated by input noise, increasing the separability threshold for which Tanh networks learn target-aligned representations, and exerting a non-monotonic influence on ReLU network representations.
Tweet media one
1
0
3
@Jack_W_Lindsey
Jack Lindsey
6 months
We propose that the striatum receives an “efference copy” of the recently selected action that excites both the dSPNs that promote that action *and* the iSPNs that suppress it. These effects cancel out in the moment, but cause the correct learning to occur for future trials
Tweet media one
1
0
2
@Jack_W_Lindsey
Jack Lindsey
2 years
Q-learning is a well-known off-policy algorithm that involves learning values of state-action combinations. But naive implementations of Q-learning are intractable in continuous action spaces, like those encountered in realistic motor control tasks. (5/13)
1
0
2
@Jack_W_Lindsey
Jack Lindsey
6 months
Efference input allows the striatum to know what the brain is doing, so it can learn from it. Because of the dSPN / iSPN plasticity rules, this input doesn’t interfere with ongoing decision-making! The system is perfectly designed to multiplex signals to enable off-policy RL.
Tweet media one
1
0
2
@Jack_W_Lindsey
Jack Lindsey
4 years
To some extent, similarity of inputs to DANs correlates with similarity of outputs from their corresponding MBONs, perhaps reflecting a form of credit assignment in which compartments receive different learning signals dependent on their functional/behavioral roles (13/22)
Tweet media one
1
0
2
@Jack_W_Lindsey
Jack Lindsey
2 years
The RL literature also contains many *off-policy* RL algorithms, which can learn effectively from data driven by other policies (e.g. exploration policies, offline demonstrations, and replay buffers). (4/13)
1
0
2
@Jack_W_Lindsey
Jack Lindsey
2 years
However, these models typically implement on-policy RL algorithms, which assume that experiences used in learning are driven by the learned policy. On-policy algorithms may be inappropriate models of BG learning, given that other brain regions also contribute to behavior. (3/13)
1
0
2
@Jack_W_Lindsey
Jack Lindsey
2 years
Classic models of RL in the basal ganglia (BG) involve RPE-signaling dopamine activity that modulates plasticity at cortico-striatal synapses. This mechanism can implement common RL algorithms like temporal difference learning, policy gradients, and actor-critic models. (2/13)
1
0
2
@Jack_W_Lindsey
Jack Lindsey
4 years
To reiterate, these tweets cover only a small and biased subsample of the paper. Read it for the full scoop! Many thanks to Feng Li and Gerry Rubin for leading this effort, Ashok Litwin-Kumar and Larry Abbott for advising / helping with my contributions, … (21/22)
1
0
2
@Jack_W_Lindsey
Jack Lindsey
3 years
@KordingLab @itamarlandau @TonyZador @tyrell_turing @PessoaBrain (If #2 isn’t satisfied then effective step size is larger, cf. #1 ). Also even if expectations are equal, expected update can be a poor approx. of an optimizer’s actual behavior given finite step size (see e.g. SGD vs. GD). Is there empirical reason not to worry about these?
1
0
2
@Jack_W_Lindsey
Jack Lindsey
6 months
Some background: Spiny projection neurons (SPNs) in. the dorsal striatum influence action selection. Direct-pathway dSPNs and indirect-pathway iSPNs are thought to promote and suppress actions, respectively.
Tweet media one
1
0
2
@Jack_W_Lindsey
Jack Lindsey
2 years
We find adding action surprise to dopamine activity (on top of RPE) improves learning when the BG shares behavioral control with other controllers. And it is *essential* for effective learning in the fully off-policy regime, when non-BG regions completely dictate behavior. (8/13)
Tweet media one
1
1
2
@Jack_W_Lindsey
Jack Lindsey
6 months
That’s all! Our study raises many questions: where do the efference copies come from? How does the brain take advantage of off-policy RL? Could the same plasticity rule trick be used elsewhere in the brain? We hope to find out!
1
0
2
@Jack_W_Lindsey
Jack Lindsey
6 months
Many thanks to my awesome collaborators and my thesis committee -- Larry Abbott, @neuro_kim , @VikramGadagkar , and @naoshigeuchida -- for their many helpful comments and suggestions.
0
0
2
@Jack_W_Lindsey
Jack Lindsey
5 years
@TheGregYang This is awesome! Two questions if you know: (1) how does the GP for vanilla RNNs, T time steps, input only at first time step, differ from that of depth-T MLP? (2) Can you still get GP behavior if weights are sparse? (e.g. w~Gaussian w.p. eps=k/N, else 0, k fixed as N->infinity)
1
0
2
@Jack_W_Lindsey
Jack Lindsey
4 years
@tyrell_turing Yeah I agree. I think the dream would be a mechanism that is general-purpose when necessary but can also take shortcuts. Probably requires more ingredients than just fixed feedback + Hebbian learning. Maybe (wild speculation) even an extra level of "meta"...
1
0
2
@Jack_W_Lindsey
Jack Lindsey
2 years
So there are computational reasons why “action surprise” in dopamine activity might be useful. But does it actually explain experimentally observed movement-related DA activity? The model makes several qualitative predictions that are consistent with experiment, (9/13)
1
0
2
@Jack_W_Lindsey
Jack Lindsey
6 months
Our model is testable! We tested our key model predictions in fiber photometry and calcium imaging data from freely behaving mice collected by Markowitz et al.
1
0
2
@Jack_W_Lindsey
Jack Lindsey
4 years
Also, connectivity-based clusters include DANs from different compartments, suggesting that such signals may be distributed across the population. These findings motivate extensions to classical RL models that incorporate distributed, vector-valued learning signals (12/22)
Tweet media one
1
0
2
@Jack_W_Lindsey
Jack Lindsey
6 months
Things brings us to our last section. Off-policy RL requires different algorithms from on-policy RL. The standard temporal difference learning model of dopamine activity is an *on-policy* model that *does not work* when the striatum shares control of behavior with other regions.
Tweet media one
1
0
2
@Jack_W_Lindsey
Jack Lindsey
2 years
The implementation of this consolidation rule depends on the task, architecture, and learning rule that dictate where synaptic updates come from. We spell out implementations for simple versions of supervised learning, reinforcement learning, and auto-associative memory. (8/21)
Tweet media one
1
1
2
@Jack_W_Lindsey
Jack Lindsey
4 years
Background: The mushroom body (MB) processes sensory info w/ a largely feedforward structure: projection neurons (PNs) -> Kenyon Cells (KCs, high-dim + sparse) -> output neurons (MBONs). MBONs are organized in compartments innervated by dopamine neurons (DANs). (3/22)
Tweet media one
1
0
1
@Jack_W_Lindsey
Jack Lindsey
4 years
This architecture challenges the conception of MBONs as pure “output” units and KC-MBON plasticity as “fitting readout weights" -- rather, this plasticity is upstream of much layered, nonlinear computation (to ponder: are there computational benefits to this arrangement?) (18/22)
1
0
1
@Jack_W_Lindsey
Jack Lindsey
6 months
Q-learning, on the other hand, works off-policy. In a previous paper we showed how Q-learning can be implemented biologically, by adding an “action surprise” term to the standard reward prediction error.
1
0
1