Stephanie Chan Profile Banner
Stephanie Chan Profile
Stephanie Chan

@scychan_brains

Followers
3,633
Following
2,008
Media
23
Statuses
531

Staff Research Scientist at Google DeepMind. Artificial & biological brains 🤖 🧠 Views are my own

San Francisco, CA
Joined November 2018
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@scychan_brains
Stephanie Chan
1 month
Come check out our work at ICML this week: 🔎 What needs to go right for an induction head? A mechanistic study of in-context learning circuits (spotlight Weds) ✨ Many-Shot In-Context Learning (oral @ LCFM workshop) 🧞 Genie: Generative Interactive Environments (oral Tues)
2
5
37
@scychan_brains
Stephanie Chan
2 years
Intriguingly, transformers can achieve few-shot learning (FSL) without being explicitly trained for it. Very excited to share our new work, showing that FSL emerges in transformers only when the training data is distributed in particular ways! 🧵👇
Tweet media one
14
189
1K
@scychan_brains
Stephanie Chan
2 years
New paper 🥳: Transformer inductive biases! Transformers generalize differently from information stored in: ‣ weights - mostly "rule-based" ‣ context - mostly "exemplar-based" This effect depends on (a) the training data (b) the size of the transformer 🧵⬇️
3
86
614
@scychan_brains
Stephanie Chan
10 months
We all know that in-context learning emerges in transformers... but our new work shows that it can actually then disappear, after long training times! We dive into this **transience** phenomenon. 🧵👇1/N
Tweet media one
7
94
476
@scychan_brains
Stephanie Chan
4 years
First day at @DeepMind tomorrow!! Incredibly excited to be working with @FelixHill84 , Stephen Clark, @AndrewLampinen , and many other amazing researchers!!
13
4
283
@scychan_brains
Stephanie Chan
3 months
This is one of the most meaningful projects I've ever worked on -- aiming to make personalized tutoring universally available. We can use AI to augment human potential and human capital, if we do it responsibly and inclusively, with best practices from education.Still lots to do!
@Google
Google
4 months
Introducing LearnLM: our new family of models based on Gemini and fine-tuned for learning. LearnLM applies educational research to make our products — like Search, Gemini and YouTube — more personal, active and engaging for learners. #GoogleIO
Tweet media one
43
208
1K
5
25
197
@scychan_brains
Stephanie Chan
1 month
Now on Arxiv -- Google DeepMind's current approach on AI for education: Also: * the LearnLM team is hiring! (mostly likely for London) * Markus Kunesch will be at the GDM booth at ICLR 4pm on Tuesday to answer questions
7
22
174
@scychan_brains
Stephanie Chan
2 years
A beautiful story about winding paths and being guided by the poetry in nature -- made me smile ❤️ "He Dropped Out to Become a Poet. Now He’s Won a Fields Medal."
2
17
147
@scychan_brains
Stephanie Chan
2 years
This is an incredible result. Transformers can meta-learn to do RL, completely from context -- no weight updates.
@MishaLaskin
Misha Laskin
2 years
In our new work - Algorithm Distillation - we show that transformers can improve themselves autonomously through trial and error without ever updating their weights. No prompting, no finetuning. A single transformer collects its own data and maximizes rewards on new tasks. 1/N
24
249
1K
2
14
122
@scychan_brains
Stephanie Chan
1 year
Pretty paradigm-shifting if this data is replicable.. dopamine doesn't seem to encode reward prediction errors, after all! Open question what it does, in that case..
1
17
105
@scychan_brains
Stephanie Chan
2 years
Apparent progress in ML research doesn't always map to real progress - it often isn't generalizable, usable or meaningful. Tomorrow at the ML Evaluation Workshop @iclr_conf , join our many distinguished speakers in discussing and improving this situation!
@agarwl_
Rishabh Agarwal
3 years
The field of ML has seen massive growth and it is becoming apparent it may be in need of self-reflection to ensure that efforts are directed towards real progress. To this end, we are organizing an @iclr_conf workshop on "ML Evaluation Standards". [1/N]
Tweet media one
1
101
437
3
21
89
@scychan_brains
Stephanie Chan
2 years
We've released the codebase for the paper "Data Distributional Properties Drive Emergent In-Context Learning in Transformers" 🥳
@scychan_brains
Stephanie Chan
2 years
Intriguingly, transformers can achieve few-shot learning (FSL) without being explicitly trained for it. Very excited to share our new work, showing that FSL emerges in transformers only when the training data is distributed in particular ways! 🧵👇
Tweet media one
14
189
1K
2
14
80
@scychan_brains
Stephanie Chan
1 year
Why do transformers work so well? @FelixHill84 explains how the architectural features of transformers correspond to features of language! Alternatively check out his excellent lecture covering similar topics:
1
10
72
@scychan_brains
Stephanie Chan
2 years
Inspired by @AnthropicAI 's Constitutional AI, I've been thinking of another legal metaphor: "AI alignment as common law"🧑‍⚖️ Models are trained to be consistent with prior decisions — prev examples of good behavior (SL) or prev judgments of good/bad (RLHF) — i.e. "precedent" 1/
6
8
69
@scychan_brains
Stephanie Chan
3 years
Virtual coding interviews!! Some people have asked for tips, since I happened to do a LOT of them last year 🤣 and developed or gathered some helpful tips and strategies. Some of these were kindly shared by others, so I'm passing it forward here! (thread)
2
10
70
@scychan_brains
Stephanie Chan
5 months
Our new paper delves into the circuits and training dynamics of transformer in-context learning (ICL) 🥳 Key highlights include 1️⃣ A new opensourced JAX toolkit that enables causal manipulations throughout training 2️⃣ The toolkit allowed us to "clamp" different subcircuits to
@Aaditya6284
Aaditya Singh
5 months
In-context learning (ICL) circuits emerge in a phase change... Excited for our new work "What needs to go right for an induction head (IH)?" We present "clamping", a method to causally intervene on dynamics, and use it to shed light on IH diversity + formation. Read on 🔎⏬
2
44
198
1
10
66
@scychan_brains
Stephanie Chan
6 months
Tokenization really matters for number representation! E.g. tokenizing numbers right-to-left (instead of left-to-right) improves GPT-4 arithmetic performance from 84% to 99%!! Awesome important work by @Aaditya6284 @djstrouse
@Aaditya6284
Aaditya Singh
6 months
Ever wondered how your LLM splits numbers into tokens? and how that might affect performance? Check out this cool project I did with @djstrouse : Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs. Read on 🔎⏬
10
33
181
0
9
67
@scychan_brains
Stephanie Chan
1 year
Impressive results by LLMs on causal reasoning benchmarks! I'm curious though how much this is driven by the LLMs' priors about what causal structures are reasonable (e.g. ), rather than causal reasoning per se. 1/
@amt_shrma
Amit Sharma
1 year
New paper: On the unreasonable effectiveness of LLMs for causal inference. GPT4 achieves new SoTA on a wide range of causal tasks: graph discovery (97%, 13 pts gain), counterfactual reasoning (92%, 20 pts gain) & actual causality. How is this possible?🧵
33
296
1K
2
8
58
@scychan_brains
Stephanie Chan
3 months
Turns out -- a feature is represented more strongly based on factors beyond its relevance to the training task. E.g. whether it's easy vs hard to compute, or learned early vs late in training. This is true even when comparing features that are learned equally well! These biases
@AndrewLampinen
Andrew Lampinen
3 months
How well can we understand an LLM by interpreting its representations? What can we learn by comparing brain and model representations? Our new paper highlights intriguing biases in learned feature representations that make interpreting them more challenging! 1/
Tweet media one
7
51
324
0
5
58
@scychan_brains
Stephanie Chan
2 years
Incredibly exciting and important news. This is the first time in my LIFE that I've been the first Stephanie Chan on any platform!!! 🥇 😄
0
3
57
@scychan_brains
Stephanie Chan
4 months
With long contexts of up to a million tokens, we can now move from few-shot to *many-shot learning* By using 100s or 1000s of shots, we saw significant improvements on math, reasoning, QA, planning, etc. We may not even need labels in many cases!! 🤯
@agarwl_
Rishabh Agarwal
4 months
We studied In-Context learning with hundreds to thousands of examples. My favorite example: I sent *one million* tokens to Gemini 1.5 Pro for linear classification with 64 dimensional integer-valued vectors and many-shot learning performs similarly to k-Nearest Neighbours.
Tweet media one
6
25
165
3
8
55
@scychan_brains
Stephanie Chan
3 years
Come check out our NeurIPS poster today, on hierarchical memory for RL agents!
@AndrewLampinen
Andrew Lampinen
3 years
Interested in how RL agents could recall the past in detail, in order to overcome the challenges of the present? Come chat with us about "Towards mental time travel: A hierarchical memory for RL agents" at #NeurIPS2021 poster session 1 (4:30 GMT/8:30 PT, spot E1)!
Tweet media one
4
31
195
0
4
50
@scychan_brains
Stephanie Chan
2 years
3/ But while certain data distributions could elicit FSL in transformers, the same training data could *not* elicit FSL in RNNs or LSTMs. Thus, FSL emerges only from applying the right architecture to the right data distribution; neither component is sufficient on its own
Tweet media one
1
4
45
@scychan_brains
Stephanie Chan
2 years
4/ This work helps us understand emergent FSL in large language models, and how we might induce FSL beyond language. Non-uniform naturalistic distributions are an important challenge for reflecting the real world, but also an opportunity for eliciting powerful new capabilities!
Tweet media one
1
1
44
@scychan_brains
Stephanie Chan
6 months
An impressive new kind of generative foundation model! Trained completely unsupervised, it generates endless *controllable world models* that are (1) controllable via interpretable discrete actions (2) generated based on image prompts including drawings! So proud of the Genie
@_rockt
Tim Rocktäschel
6 months
I am really excited to reveal what @GoogleDeepMind 's Open Endedness Team has been up to 🚀. We introduce Genie 🧞, a foundation world model trained exclusively from Internet videos that can generate an endless variety of action-controllable 2D worlds given image prompts.
145
571
3K
1
2
41
@scychan_brains
Stephanie Chan
4 years
How does the brain learn a "model" for planning and model-based decision making? We found evidence that OFC activity is associated with learning the state-to-state transition function
@biorxiv_neursci
bioRxiv Neuroscience
4 years
Orbitofrontal cortex and learning predictions of state transitions #biorxiv_neursci
0
24
45
1
13
43
@scychan_brains
Stephanie Chan
2 years
1/ Transformers have at least two modes of learning: 1⃣Few-shot in-context learning in the activations 2⃣Slow, gradient-based updates in the weights Certain distributional properties, e.g. burstiness and long-tailedness, could bias transformers to learn in one way or the other!
Tweet media one
1
4
42
@scychan_brains
Stephanie Chan
1 month
Will be at ICML @ Vienna next week, and Cog Sci @ Rotterdam for one day -- let me know if you'll be there. Would love to meet up with folks!
4
1
41
@scychan_brains
Stephanie Chan
13 days
@AndrewLampinen is one of my favorite people in the world to collaborate with, and anyone would be lucky to work with him. Please apply to the team if you're interested in any of the cognitively-oriented research described below!
@AndrewLampinen
Andrew Lampinen
13 days
Really excited to share that I'm hiring for a Research Scientist position in our team! If you're interested in the kind of cognitively-oriented work we've been doing on learning & generalization, data properties, representations, LMs, or agents, please check it out!
10
60
336
1
7
39
@scychan_brains
Stephanie Chan
2 years
2/ The properties that encourage FSL are exemplified by natural language, but are actually inherent to many kinds of natural data, including first-person experience. And they are a departure from the uniform i.i.d. distributions that typify standard supervised data
Tweet media one
2
1
39
@scychan_brains
Stephanie Chan
20 days
Congratulations to the team for this awesome robotics result!! It really stands out in that (a) it works in an extremely fast-paced setting, unlike most robotics algos (b) it's interesting that it was helpful to have a hierarchical controller that selects different skills and
@GoogleDeepMind
Google DeepMind
20 days
Meet our AI-powered robot that’s ready to play table tennis. 🤖🏓 It’s the first agent to achieve amateur human level performance in this sport. Here’s how it works. 🧵
139
842
4K
1
5
36
@scychan_brains
Stephanie Chan
3 months
More people should know about this interesting paper! It solves exact learning dynamics for a class of nonlinear networks, and uncovers properties that help subnetworks learn faster and win the race over others. Could eg help explain which ones end up "lottery tickets" @jefrankle
@SaxeLab
Andrew Saxe
2 years
“The Neural Race Reduction: Dynamics of abstraction in gated networks” New paper @icmlconf We derive reductions and occasionally exact explicit solutions for the learning dynamics of a class of nonlinear deep networks in the rich representation learning regime.
Tweet media one
2
44
199
0
4
35
@scychan_brains
Stephanie Chan
2 years
A big step forward for fast, sample-efficient RL!
@FeryalMP
Feryal
2 years
I’m super excited to share our work on AdA: An Adaptive Agent capable of hypothesis-driven exploration which solves challenging unseen tasks with just a handful of experience, at a similar timescale to humans. See the thread for more details 👇 [1/N]
25
266
1K
1
4
37
@scychan_brains
Stephanie Chan
2 years
@_jasonwei If you need an antidote to The Bitter Lesson malaise, check out our new work! 💊🙂 We show that it's the *distributional properties* of data, rather than scale per se, that leads to an interesting behavior like few-shot learning in transformers
@scychan_brains
Stephanie Chan
2 years
Intriguingly, transformers can achieve few-shot learning (FSL) without being explicitly trained for it. Very excited to share our new work, showing that FSL emerges in transformers only when the training data is distributed in particular ways! 🧵👇
Tweet media one
14
189
1K
1
0
36
@scychan_brains
Stephanie Chan
2 years
Transformers have the powerful ability to use two different kinds of information: 1⃣ information stored in weights during training (e.g. via gradient descent) 2⃣ information provided in context at inference time (e.g. in a "prompt") (1/)
1
2
36
@scychan_brains
Stephanie Chan
6 months
Agree so hard that, esp with the advent of extremely long context, we need to think deeply about how information behaves differently when you put it in context vs weights vs other kinds of memory! Thanks so much @xiao_ted for the shout-outs to our work! Check out e.g.
@xiao_ted
Ted Xiao
6 months
I can’t emphasize enough how mind-blowing extremely long token context windows are. For both AI researchers and practitioners, massive context windows will have transformative long-term impact, beyond one or two flashy news cycles. ↔️ “More is different”: Just as we saw emergent
Tweet media one
6
59
310
0
3
32
@scychan_brains
Stephanie Chan
3 years
Excitingly, the ML Evaluation Standards workshop @iclr_conf will be collaborating with @SchmidtFutures to grant $15k in awards for workshop submissions and reviewers! More info below 👇
Tweet media one
3
11
32
@scychan_brains
Stephanie Chan
1 month
Our paper comparing human and LM reasoning -- now published (open source)!
@AndrewLampinen
Andrew Lampinen
1 month
Pleased to share that the final version of our work "Language models, like humans, show content effects on reasoning tasks" has now been published in @PNASNexus (open access)! For a still-mostly-up-to-date summary, see this thread.
1
13
77
0
8
32
@scychan_brains
Stephanie Chan
3 years
To date, it's taken half a century to map 35% of human proteins. Now @DeepMind has released predictions on almost the entire human proteome, for free! So many implications.. so proud of the team for this work
@GoogleDeepMind
Google DeepMind
3 years
Today with @emblebi , we're launching the #AlphaFold Protein Structure Database, which offers the most complete and accurate picture of the human proteome, doubling humanity’s accumulated knowledge of high-accuracy human protein structures - for free: 1/
99
3K
7K
1
1
31
@scychan_brains
Stephanie Chan
10 months
Updated: our work comparing humans and language models on reasoning tasks! Neither humans nor LMs are perfect reasoners, and in fact show very similar patterns of errors. E.g., both perform better when the correct answer accords with situations that are familiar and realistic.
@AndrewLampinen
Andrew Lampinen
10 months
Very excited to share a substantial updated version of our preprint “Language models show human-like content effects on reasoning tasks!” TL;DR: LMs and humans show strikingly similar patterns in how the content of a logic problem affects their answers. Thread: 1/
Tweet media one
3
51
279
0
1
31
@scychan_brains
Stephanie Chan
1 year
I'll be at @iclr_conf in Kigali next week -- message me if you'll be there and would like to meet up! ☺️
2
2
31
@scychan_brains
Stephanie Chan
3 years
By popular demand, the RL Reliability Metrics library now supports processing CSV input data, in addition to TF summaries! Now you can easily measure the reliability of your RL model outputs from any ML library! CSV is now the default in the example.
2
9
30
@scychan_brains
Stephanie Chan
10 months
We trained on tasks that can be solved by both in-context learning (ICL) and in-weights learning (IWL). The models initially develop emergent ICL... but then asymptotically give way to IWL! All the while the model loss continues to decrease. 2/N
Tweet media one
1
0
28
@scychan_brains
Stephanie Chan
2 years
5/ Thanks to my amazing collaborators -- I've really loved working with them on this project ❤️: @FelixHill84 , @santoroAI , @AndrewLampinen , @janexwang , @Aaditya6284 , @TheOneKloud , and Jay McClelland
1
1
28
@scychan_brains
Stephanie Chan
10 months
Indeed! Weight decay seems to eliminate ICL transience completely, at least for the training times we tried. This is interesting since many large language models are trained with weight decay. 5/N
Tweet media one
1
1
28
@scychan_brains
Stephanie Chan
6 years
@ilyasut I disagree with this! Given the amount of physical analogies that we use in describing these math concepts, even in ML courses, I think it's clear that physics is a more intuitive setting in at least some instances
1
0
27
@scychan_brains
Stephanie Chan
14 days
So many of these researchers are heroes and role models to me, not least my PhD co-advisor @yael_niv 😊 Thanks so much to all of you for being amazing trailblazers and advocates! @ashrewards @natashajaques @ancadianadragan @chelseabfinn @FinaleDoshi Doina Precup and many others
@pcastr
Pablo Samuel Castro
17 days
Really nice initiative by @ben_eysenbach , who prepared these posters (hung around @RL_Conference ) of notable women in RL !
Tweet media one
4
41
249
1
1
27
@scychan_brains
Stephanie Chan
1 year
It's finally here!! "Ada and the Supercomputer", written by my dear friend Doris and now Amazon #1 for teen fiction! The book is inspired by Doris's math PhD, startups, her immense ambition.. The result is adventure+STEM+coming-of-age, and fully a story about resiliency. link👇
Tweet media one
1
1
27
@scychan_brains
Stephanie Chan
9 months
This is so great.. @kpeteryu et al took our insights on data distributions + in-context learning to videos, and improved few-shot learning for video narration!! Amazing to be part of a vibrant research community where others can take our work much farther than we can ourselves 😍
@kpeteryu
Peter Yu
9 months
Ever wondered what it would take to train a VLM to perform in-context learning (ICL) over egocentric videos 📹? Check out our work EILEV! @SLED_AI @michigan_AI Website: Technical Report: A thread 🧵
1
12
28
2
1
23
@scychan_brains
Stephanie Chan
2 years
As a practical matter, these distinctions are important to recognize, so that we know where and how to provide information to transformers (in weights vs in context), depending on what generalization behaviors we prefer. (5/)
2
0
23
@scychan_brains
Stephanie Chan
2 years
Language models are influenced by prior beliefs when they perform reasoning tasks... in similar ways to humans! Important for understanding the limitations of LMs, and parallels with human reasoning. And demonstrates how ML can usefully borrow ideas from cognitive science
@AndrewLampinen
Andrew Lampinen
2 years
Abstract reasoning is ideally independent of content. Language models do not achieve this standard, but neither do humans. In a new paper ( co-led by Ishita Dasgupta) we show that LMs in fact mirror classic human patterns of content effects on reasoning. 1/
7
56
378
1
3
22
@scychan_brains
Stephanie Chan
4 months
Learning without gradients 😊 Submit for the 1st Workshop on In-Context Learning at ICML -- July 27 in Vienna!
@julien_siems
Julien Siems
4 months
Excited to announce the 1st Workshop on In-Context Learning (ICL) at ICML. #ICML2024 #ICL
1
9
34
0
4
22
@scychan_brains
Stephanie Chan
4 months
Amazing work by the Astra AI Assistant team!! Huge potential for accessibility or for folks who have low vision
@GoogleDeepMind
Google DeepMind
4 months
We’re sharing Project Astra: our new project focused on building a future AI assistant that can be truly helpful in everyday life. 🤝 Watch it in action, with two parts - each was captured in a single take, in real time. ↓ #GoogleIO
223
1K
4K
1
1
22
@scychan_brains
Stephanie Chan
2 years
In fact, pretrained language models are much more rule-based from context, compared to the "neutrally trained" transformers - perhaps due to the combinatorial nature of language. But this effect is modulated by model size: larger language models are more rule-based! (4/)
Tweet media one
1
0
22
@scychan_brains
Stephanie Chan
1 year
It's often said that causal learning requires active intervention.. but e.g. many of us learn how to do science just from reading about it! This is a more nuanced take on how passive learning (as in language models) can lead to learning about causality and experimentation
@AndrewLampinen
Andrew Lampinen
1 year
What can be learned about causality and experimentation from passive data? What could language models learn from simply passively imitating text? We explore these questions in our new paper: “Passive learning of active causal strategies in agents and language models” Thread: 1/
Tweet media one
11
74
401
0
3
22
@scychan_brains
Stephanie Chan
10 months
This phenomenon of transience is pretty surprising! And may have important ramifications, esp if we continue moving towards "overtraining" small models for long training times. Excited to continue investigating what is happening here! 📈📉 9/9
Tweet media one
1
0
21
@scychan_brains
Stephanie Chan
10 months
Second, if we apply regularization selectively to different parts of the model, we see mitigation only when weight decay is applied to the MLP layers... and in fact this is where in-weights learning may largely reside (e.g. Geva et al, 2020). 8/N
Tweet media one
2
0
20
@scychan_brains
Stephanie Chan
10 months
Can we eliminate ICL transience? It's the opposite of "grokking", where models use *more* general solutions after long train times -- whereas here we might consider IWL the *less* general solution. Regularization may drive grokking.. could it also keep our models using ICL? 4/N
2
0
21
@scychan_brains
Stephanie Chan
2 years
Transformers tend to generalize in a "rule-based" way from information stored in weights, but in an "exemplar-based" way from information stored in context! However, you can overcome the exemplar-based bias and by training a transformer explicitly on a rule-based task. (3/)
1
0
21
@scychan_brains
Stephanie Chan
2 years
@AlexGDimakis If you'd like your faith restored in the endeavor of ML research, check out our new work! We show that the *distributions* of data matter, rather than scale, for eliciting an interesting behavior like few-shot learning
@scychan_brains
Stephanie Chan
2 years
Intriguingly, transformers can achieve few-shot learning (FSL) without being explicitly trained for it. Very excited to share our new work, showing that FSL emerges in transformers only when the training data is distributed in particular ways! 🧵👇
Tweet media one
14
189
1K
1
1
21
@scychan_brains
Stephanie Chan
2 years
@YiTayML @jacobmbuckman Agree with @YiTayML that you'd have a hard time getting the same results with LSTMs. See our results showing that LSTMs don't exhibit in-context/few-shot learning when transformers do, matched on data + num params (Fig 7)
2
1
20
@scychan_brains
Stephanie Chan
3 years
❤️ Yes thank you @WiMLworkshop for the feature! Surprising results about surprise.. OFC aids learning about state transitions, but not via prediction errors. Instead OFC activity correlated with humans correctly expecting a more probable outcome.. i.e. more optimal predictions!
@yael_niv
Yael Niv
3 years
Thank you, @WiMLworkshop , for highlighting this work by @scychan_brains , inspired by rodent experiments from Geoff Schoenbaum's lab, first piloted in humans in our lab by @ninalopatina maybe 13 years ago!! This was one of those long projects... All the kudos to @scychan_brains !!
0
0
8
0
1
19
@scychan_brains
Stephanie Chan
2 years
So awesome to see how RL is now effective on complex real world problems. Still remember reading @AlexIrpan 's superb essay "Deep RL doesn't work yet", and the general uncertainty around RL's efficacy, just a few years ago!
@CauseMean
Cosmin Paduraru
2 years
Excited to share the details of our work at @DeepMind on using reinforcement learning to help large-scale commercial cooling systems save energy and run more efficiently: . Here’s what we found 🧵
9
77
485
1
3
19
@scychan_brains
Stephanie Chan
2 years
Join us for the #NeurIPS panel today on transformer-related topics! @tsiprasd and I will discuss our two papers on in-context learning in transformers, both selected as orals! This year, orals will be presented as 15-min deep-dive discussions 🥽🫧🫧
@scychan_brains
Stephanie Chan
2 years
Intriguingly, transformers can achieve few-shot learning (FSL) without being explicitly trained for it. Very excited to share our new work, showing that FSL emerges in transformers only when the training data is distributed in particular ways! 🧵👇
Tweet media one
14
189
1K
1
3
18
@scychan_brains
Stephanie Chan
3 years
Just out: Our new commentary on how AI and Psychology can learn from each other to address challenges in generalizability. "Fast publishing" (more common in AI) promotes rapid iteration and inclusivity, while "slow publishing" (more common in Psych) integrates knowledge over time
@AndrewLampinen
Andrew Lampinen
3 years
New commentary on "The Generalizability Crisis" by @talyarkoni : "Publishing fast and slow: A path toward generalizability in psychology and AI." We argue that these fields share similar generalizability challenges, and could learn from each other.
2
11
51
0
1
17
@scychan_brains
Stephanie Chan
3 years
Congrats to @agarwl_ et al on the Outstanding Paper award @NeurIPS !! It's such important work. If you're a fan of rigorous RL evaluation, you may also be interested in our ICLR 2020 work on measuring the reliability of RL itself:
@GoogleAI
Google AI
3 years
Congratulations to the authors of “Deep RL at the Edge of the Statistical Precipice”, a #NeurIPS2021 Outstanding Paper ()! You can learn more about it in the blog post below, and we look forward to sharing more of our research at this year’s @NeurIPSConf .
7
75
384
2
0
18
@scychan_brains
Stephanie Chan
2 years
We investigate inductive biases using a paradigm that allows us to distinguish between: 📏"rule-based" generalization, based on parsimonious rules 💠"exemplar-based" generalization, based on direct comparison with the features of observed examples (2/)
Tweet media one
2
0
18
@scychan_brains
Stephanie Chan
2 years
Amazing how far language-conditioned robotics has come, from just a couple years ago! @coreylynch was one of the earliest to see the potential. Congrats to him, @peteflorence , and the other authors!
@peteflorence
Pete Florence
2 years
"Interactive Language: Talking to Robots in Real Time" - Real-time, interactive, open-vocabulary, language+pixels -> actions - A new scale (~600,000 traj.) for language-conditioned behavior - Dataset, sim, models, code all to be released! (1/n)...
8
184
837
1
1
16
@scychan_brains
Stephanie Chan
2 years
Absolutely. We will inevitably need additional new methods, but newer larger LMs (based on the same architectures) can already solve a number of tasks that were previously deemed out of reach (i.e. not tailored for LMs)
@_jasonwei
Jason Wei
2 years
This ignores a huge body of work. We have seen new abilities emerge with every new SOTA language model. New abilities will keep emerging. BIG-Bench claims to target weaknesses of language models, and 540B PaLM beat avg. human rater on more than half just by scaling.
10
25
256
0
0
17
@scychan_brains
Stephanie Chan
10 months
This work was done with my amazing collaborators!!! @Aaditya6284 @ted_moskovitz @ermgrant @saxelab @FelixHill84 N/N
1
0
17
@scychan_brains
Stephanie Chan
3 years
New preprint with @AndrewLampinen , Andrea Banino, and @FelixHill84 !
@AndrewLampinen
Andrew Lampinen
3 years
How can RL agents recall the past in detail, in order to behave appropriately in the present? In our new preprint "Towards mental time travel: A hierarchical memory for RL agents" () we propose a memory architecture that steps in this direction.
3
69
339
0
0
16
@scychan_brains
Stephanie Chan
10 months
First, we can mitigate ICL transience by increasing the width of the model (hence relieving competition between ICL and IWL). (Increasing width does not affect ICL if IWL is not a valid strategy for the training problem, i.e. width is not helping ICL directly) 7/N
Tweet media one
1
0
17
@scychan_brains
Stephanie Chan
10 months
We observed this behavior across a range of settings (model depth, dataset size, language model embeddings vs image inputs) 3/N
Tweet media one
1
0
17
@scychan_brains
Stephanie Chan
1 year
Pet peeve: Anyone doing multiple-choice evals needs to account for "surface form competition"! TLDR: the highest probability answer isn't always the one with highest model "belief", because a single concept can take multiple forms in text. The good news: it's easy to account for
@universeinanegg
Ari Holtzman
3 years
🔨ranking by probability is suboptimal for zero-shot inference with big LMs 🔨 “Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right” explains why and how to fix it, co-lead w/ @PeterWestTM paper: code:
Tweet media one
5
43
175
0
2
17
@scychan_brains
Stephanie Chan
10 months
But why does ICL transience happen in the first place? We find two pieces of evidence which indirectly but convergently point to the same cause: competition with IWL circuits. 6/N
1
0
17
@scychan_brains
Stephanie Chan
1 year
These are great resources for teachers and for educating kids about AI -- the @RaspberryPi_org team has been so impressively thoughtful in creating them, and I'm really excited to see them released today!!
@RaspberryPi_org
Raspberry Pi Foundation
1 year
EXCITING NEWS 🎉 Experience AI launches today in partnership with @DeepMind . Our new AI and machine learning programme for teachers, students, and other educators. Find out more 👉 #AI #MachineLearning #DeepMind #ExperienceAI
Tweet media one
18
232
985
0
3
16
@scychan_brains
Stephanie Chan
2 months
One of those gems that is both theoretically interesting and has practical import!
@_ironjr_
Jaerin Lee
3 months
📣📣📣 We are excited to announce our new paper, “Grokfast: Accelerated Grokking by Amplifying Slow Gradients”! 🤩 Reinterpreting ML optimization processes as control systems with gradients acting as signals, we accelerate the #grokking phenomenon up to X50, making a step
Tweet media one
5
14
121
1
1
16
@scychan_brains
Stephanie Chan
2 years
*Transformer inductive biases* Come check out our #NeurIPS poster today at the MemARI workshop! (and check out the rest of the workshop too -- a really interesting lineup!) Video for those who can't make it in person:
@scychan_brains
Stephanie Chan
2 years
New paper 🥳: Transformer inductive biases! Transformers generalize differently from information stored in: ‣ weights - mostly "rule-based" ‣ context - mostly "exemplar-based" This effect depends on (a) the training data (b) the size of the transformer 🧵⬇️
3
86
614
1
1
15
@scychan_brains
Stephanie Chan
1 year
Pause the excitement about language models for a second -- lots of innovative and exciting things are still happening in control! Co-evolving a mechanical body with a neural network (but using gradient descent!). Read to the last animation
@denizzokt
Deniz Oktay
2 years
Super excited to introduce Neuromechanical Autoencoders! We build "artificial mechanical intelligence" by coupling parametric neural networks with parametric mechanical metamaterials, accepted to ICLR 2023 as a Spotlight!
3
41
223
0
0
15
@scychan_brains
Stephanie Chan
2 months
Very cool work on controlling the balance between in-context and in-weights learning
@surajk610
Suraj Anand
2 months
How robust are in-context algorithms? In new work with @michael_lepori , @jack_merullo , and @brown_nlp , we explore why in-context learning disappears over training and fails on rare and unseen tokens. We also introduce a training intervention that fixes these failures.
Tweet media one
2
11
79
0
1
15
@scychan_brains
Stephanie Chan
4 months
Come to Rotterdam and chat about in-context learning with us!
@marcel_binz
Marcel Binz
4 months
Excited to announce our full-day workshop on “In-context learning in natural and artificial intelligence” at CogSci ( @cogsci_soc ) 2024 in Rotterdam (with @JacquesPesnot @akjagadish @summerfieldlab and Ishita Dasgupta).
4
20
83
1
1
14
@scychan_brains
Stephanie Chan
1 year
I think this paper hasn't gotten enough attention (prompt tuning models for new tasks when you only have API access). Prediction: lots of small actors will be doing this soon
1
1
13
@scychan_brains
Stephanie Chan
2 years
Really excited to participate in these discussions next week.. registration is still open (and free!)
@raphaelmilliere
Raphaël Millière
2 years
Program now live! June 29 – Why Compositionality Matters for AI w/ @AllysonEttinger , @paul_smolensky , @GaryMarcus & myself June 30 – Can Language Models Handle Compositionality? w/ @_dieuwke_ , @tallinzen , @elliepavlick , @scychan_brains & @LakeBrenden
4
28
110
0
0
12
@scychan_brains
Stephanie Chan
3 years
Like many others, we argue that ML could benefit from slower, more careful publishing. But also -- perhaps unfashionably -- we argue that "fast science" has benefits too.. for inclusivity, rapid iteration, and more
@AndrewLampinen
Andrew Lampinen
3 years
Excited that our commentary "Publishing fast and slow: A path toward generalizability in psychology and AI" is out now! The legendary @talyarkoni even agrees with some of it.
1
9
37
1
1
11
@scychan_brains
Stephanie Chan
2 years
@DeepMind @santoroAI @AndrewLampinen @janexwang @Aaditya6284 @TheOneKloud @FelixHill84 Also see our related work, on Zipfian environments for reinforcement learning! Code will be released soon, for those of you itching to start exploring non-uniform distributions for RL:
Tweet media one
0
2
11
@scychan_brains
Stephanie Chan
1 year
This is a game changer. Replace RLHF with a theoretically identical (and much cheaper to train) form of supervised learning
@archit_sharma97
Archit Sharma
1 year
Ever wondered if the RL in RLHF is really needed? Worried that you might really need to understand how PPO works? Worry no more, Direct Preference Optimization (DPO) allows you to fine-tune LMs directly from preferences via a simple classification loss, no RL required. 🧵 ->
16
132
785
0
1
10
@scychan_brains
Stephanie Chan
3 years
Check out our new work on how **generating explanations** can help RL agents, by enabling better representations of the causal and relational structure of the world!
@AndrewLampinen
Andrew Lampinen
3 years
Explanations play a critical role in human learning, particularly in challenging areas—abstractions, relations and causality. We show they can also help RL agents in "Tell me why!—Explanations support learning of relational and causal structure" (). Thread:
3
38
213
1
1
10
@scychan_brains
Stephanie Chan
28 days
I'm trying to find a paper I briefly saw, proposing a potential hypothesis for why many-shot learning might eventually decrease in performance -- could anyone point me to it? 🙏
6
0
10
@scychan_brains
Stephanie Chan
3 years
Come intern at DeepMind!
@AndrewLampinen
Andrew Lampinen
3 years
DeepMind internship applications are open! (Deadline October 4th)
0
4
21
0
1
10
@scychan_brains
Stephanie Chan
2 years
Maybe this analogy helps us think about the pros and cons of this type of "stare decisis" in training (e.g. self-consistency and predictability, vs difficulty with novel cases) 2/
1
0
8
@scychan_brains
Stephanie Chan
7 months
@_jasonwei Our work (and now others' replications in multiple domains) show that this would significantly harm in-context learning abilities, unfortunately
@scychan_brains
Stephanie Chan
2 years
Intriguingly, transformers can achieve few-shot learning (FSL) without being explicitly trained for it. Very excited to share our new work, showing that FSL emerges in transformers only when the training data is distributed in particular ways! 🧵👇
Tweet media one
14
189
1K
0
0
9
@scychan_brains
Stephanie Chan
7 months
Stay tuned ;)
@Aaditya6284
Aaditya Singh
7 months
Excited for our transience work to be highlighted in the update from @ch402 @AnthropicAI . Their Transfomer Circuits thread has always been an inspiration to me -- actively working on mechanistic analyses of transience and should have updates soon :)
Tweet media one
1
4
28
0
0
9
@scychan_brains
Stephanie Chan
2 years
Great overview by @weidingerlaura on how to think more precisely about the potential risks of LLMs
@andrey_kurenkov
Andrey Kurenkov
2 years
Had a great time talking to @weidingerlaura about some of the recent papers she and her colleagues at @DeepMind have published about LLMs - big fan!
0
4
7
0
2
8
@scychan_brains
Stephanie Chan
6 months
RLAIF may not be beneficial if you do SFT with a strong teacher
@archit_sharma97
Archit Sharma
6 months
High-quality human feedback for RLHF is expensive 💰. AI feedback is emerging as a scalable alternative, but are we using AI feedback effectively? Not yet; RLAIF improves perf *only* when LLMs are SFT'd on a weak teacher. Simple SFT on a strong teacher can outperform RLAIF! 🧵->
Tweet media one
13
52
335
0
0
9
@scychan_brains
Stephanie Chan
7 months
Super impressive work led by @thtrieu_ on solving IMO geometry problems
@GoogleDeepMind
Google DeepMind
7 months
Introducing AlphaGeometry: an AI system that solves Olympiad geometry problems at a level approaching a human gold-medalist. 📐 It was trained solely on synthetic data and marks a breakthrough for AI in mathematical reasoning. 🧵
126
1K
4K
0
0
8
@scychan_brains
Stephanie Chan
1 year
Really loved the "compositionality gap" metric in this paper, where (correctness on final answer)/(correctness on subproblems) stayed constant at 40% across model sizes! Very curious whether this holds for GPT-4 as well
@OfirPress
Ofir Press
2 years
We've found a new way to prompt language models that improves their ability to answer complex questions Our Self-ask prompt first has the model ask and answer simpler subquestions. This structure makes it easy to integrate Google Search into an LM. Watch our demo with GPT-3 🧵⬇️
52
306
2K
1
0
8
@scychan_brains
Stephanie Chan
3 years
YC is a careful, thoughtful scientist bringing together social psychology and neuroscience in new and interesting ways, and I've always thought that he would be an exceptional mentor -- I highly recommend taking a look at his lab opening!
@YuanChangLeong
Yuan Chang Leong
3 years
I am looking to hire a lab manager to help set up my new lab @UChicagoPsych ! If you’re interested in studying how motivation influences how we see, think, decide, and interact with others, please consider applying. More info: . Please help spread the word!
38
164
311
0
0
8