Day 3: Officially entered the job market of my local anarchist commune. Dismayed to learn that an honors thesis on Thucydides would only net two la croix and a jar of cheez whiz per day.
#chazseattle
Life update: I've joined the Frontier Red Team
@AnthropicAI
for the summer!
Building safe models is the *highest* priority for our field, and I'm thrilled to be working with an amazing team to secure that future.
I'll be in SF Jun-Aug, and excited to see friends old and new!
Humans communicate preferences by providing rich linguistic feedback. Yet, preference-learning algorithms do not always take this social learning view into account.
We leverage pragmatic communication for RLHF in our
#ICML2024
paper!
Paper: 🧵⬇️
Introducing LGA (Language-Guided Abstraction) at ICLR 2024! 🧵
📰 Paper:
🌐 Website:
🗞️ MIT News:
State abstraction is key to generalizable learning, but how do we know which features are task-relevant?
Humans use abstractions for data-efficient learning. We wish for neural networks to do the same.
In our proposed human-in-the-loop framework, we automatically generate a spectrum of abstractions and allow users to deploy task-appropriate ones.
To appear at
#NeurIPS2023
! [1/n]
I'm excited to share our
#ICML2023
paper: we develop a user-informed framework for eliciting feedback to diagnose and fix policy failures.
Project page: . [1/8]
very unclear if currently living in seattle, east berlin, or book 3 of the hunger games rn. do i still have to pay rent in the capitol hill autonomous zone?
I stand in solidarity with
@MITGradUnion
. All workers, students or not, should have the chance to organize and fight for better benefits, affordable housing, and more. Putting power in the hands of workers is how we level the playing field.
Excited to present an encore of our AAAI oral at
#CHI2022
#TRAIT2022
today!
As we continue to deploy real AI recommender systems in the world, it is important to understand their impact on human BIAS and ACCURACY.
On the ground for
#ICLR2024
!
I'd love to talk about:
1. Personalized preference learning
2. Building abstractions from language
3. The future of AI safety (and policy)
I'll also be presenting the following work👇
On the plane to
#NeurIPS2023
!
Excited to talk Human-Guided Complexity-Controlled Abstractions on Thurs + help host GCRL Workshop on Fri ().
I'd love to chat about:
1) abstraction in RL
2) learning world models from humans
3) the future of AI safety!
Humans use abstractions for data-efficient learning. We wish for neural networks to do the same.
In our proposed human-in-the-loop framework, we automatically generate a spectrum of abstractions and allow users to deploy task-appropriate ones.
To appear at
#NeurIPS2023
! [1/n]
I'm excited to share our
#ICML2023
paper: we develop a user-informed framework for eliciting feedback to diagnose and fix policy failures.
Project page: . [1/8]
New paper: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
We survey over 250 papers to review challenges with RLHF with a focus on large language models. Highlights in thread 🧵
I unfortunately will be unable to make it to
#ICML2024
, but
@dabelcs
will be presenting Pragmatic Feature Preferences () at the main conference! Hope everyone gets a chance to ride the roller coasters by the conference center :)
An oldie but a goodie - representation alignment is important! Even more nowadays in the age of scaling up human feedback for task learning.
Come chat w/us at HRI about it :)
This is a block from where I live. The protest was peaceful for 4 hours, as it was yesterday, and as it was the day prior. There are elderly and children on my street, which is now filled with tear gas. The only riot here was the one started by
@SeattlePD
.
@seattleprotests
"Reliable carbon accounting won't solve the climate crisis, but it is essential for implementing strategies that could." Check out our paper out in
@Nature
co-led by
@amyluers
and
@LeehiYona
!
I don't often get to meet folks doing *radically* cool things outside of my field nowadays -- but this community continues to amaze and inspire me.
If you're a technologist between 18-23 - considering applying!
Very excited to announce that I'll be joining
@MIT
's AeroAstro department as a Boeing Assistant Professor in Fall 2024. I'm thankful to my mentors and collaborators who have supported me during my PhD, and I look forward to working with students and colleagues at
@MITEngineering
.
How can robots align their representations of the task with their human partners? Really excited to be organizing a
#CoRL2022
workshop exploring this topic!
We have an amazing lineup of speakers, and we're now accepting submissions until 10/21:
“We’re going to have to embrace our shared humanity if we’re going to get through this.” Closing remark from IL congressman
@RepBillFoster
(the only physics PhD in Congress) at
@Aon_plc
’s AI event in the context of our society getting through the inevitable AI onslaught
Fam calls from Wuhan offering reassurance that they’re ok and to get tested early since is free and US healthcare best in world. Did not have heart to explain that in nation of best healthcare in world, I would neither be able to get tested early nor for free.
#CoronaVirusSeattle
Into the woods:
#AI
goes mushroom foraging to learn how humans make choices. A new machine learning framework by
@tianminshu
,
@TheAndiPenguin
,
@dabelcs
, & more promises to make systems more personal and ethical. Learn more:
Announcing the NeurIPS 2023 Workshop on Goal-Conditioned Reinforcement Learning (GCRL).
We welcome either a 5-minute video or 2-page paper by October 4th, 2023.
More info:
🧵🎉 Our new preprint is up, and we’d love your feedback! We're "Getting Aligned on Representational Alignment" - the degree to which internal representations of different (biological & artificial) information processing systems agree. 🧠🤖🔬🔍
#CognitiveScience
#Neuroscience
#AI
1. LGA () on Tuesday morning
2. PLGA () in the LLM Agents workshop on Saturday
3. I'll also be a panelist in the RepAlign workshop () on Saturday
Please email or DM if you'd like to meet up!
To automatically construct diverse abstractions, we use a discrete information bottleneck approach to trade off complexity, informativeness, and utility of neural representations. The key idea is that penalizing complexity allows us to induce more general abstractions. [2/n]
Our goal is to foster interdisciplinary discussions on topics such as learning with multimodal human feedback, learning without tagged rewards, interaction-grounded learning, personalized interactive learning, applied and theoretical implications for HCI and embodied learning.
We are excited about future directions that leverage human knowledge in tandem with cognitively-motivated training objectives!
Paper:
Code:
Reach out! We're friendly and happy and love taking walks at conferences!
When humans give preferences, there are "abstractions" at play, or in other words, there are "features" that most directly contribute to their preferences.
Ex: a mushroom forager🍄 may prefer Mushroom A over B because A is more colorful. We call this a "feature preference".
In a crowdsourced study where we collect hundreds of thousands of biographies from the Internet, then pair them with over 38,400 human annotations on a hybrid decision-task, we find that while a better AI always improves human accuracy...
We find that policies deployed with our framework result in (1) significantly more accurate user feedback compared to seeing behaviour alone, and (2) higher performance on desired test tasks with fewer human demonstrations. [6/8]
In our framework, given a human demonstration, we search through the concept space for observations that would have resulted in the policy succeeding *had a specific concept changed*. This can be seen as a *contrastive explanation*, helping isolate the cause of failure. [5/8]
But how do we reliably elicit this feedback? Problematically, humans are not always reliable identifiers of feature-level black box model failures. Inspired by the interpretability literature, we formulate a *counterfactual* approach. [4/8]
To see how LGA can be further extended to infer implicit human preferences, see our followup work, PLGA (Preference-Conditioned Language-Guided Abstraction) from HRI 2024!
I'm particularly excited by the direction of incorporating interpretability-based tools to help fix model failure, in the hope of leveraging end users to more efficiently perform interactive alignment of robotic policies at test-time. [7/8]
Importantly, LGA complements traditional supervised learning methods like behavior cloning (BC), WITHOUT relying on pre-trained skills, additional environment interaction, large multitask datasets, or even the ability to exhaustively describe behavior in language.
[7/n] I’m super excited about utilizing pretrained models (such as LMs) in conjunction with human feedback to interactively learn human-aligned representations for decision-making.
LGA improves sample efficiency and distributional robustness in both single- and multi-task settings, matching the performance of human-designed state abstractions while requiring a fraction of the human effort. See Moana (our Spot robot) in action!
We propose a pedagogical framework for modeling feature preferences.
Our key insight is that humans communicate preferences pragmatically: when they describe which features are important to their preference, they are also implicitly revealing which features are NOT important.
In a user study, we found that pragmatic feature preference queries did NOT cause users to experience more frustration with providing labels vs. RLHF queries. 😡
This is an important finding, as it suggests we should continue exploring ways to learn from natural human feedback.
LGA begins by querying the user for high-level task descriptions, then uses a LM to translate these descriptions into task-relevant state abstractions.
Intuitively, this can be thought of as language-guided attention, allowing strong human priors to steer representation learning.
WS
@corl_conf
discussing representations that enable robots and humans to learn, reason & act. If these representations are inherently different, how can we align them computationally to make our interaction with robots efficient, fluent and transparent.
@andreea7b
@TheAndiPenguin
Our insight is that *end users are uniquely positioned to recognize which concepts are irrelevant for their desired task*. If we had a way to reliably query for irrelevant concepts, then we could use data augmentation to quickly finetune the policy. [3/8]
Policies deployed in the world face different sources of distribution shift. Data augmentation can help models be more robust by varying *task-irrelevant* concepts. But how do we know what is task-irrelevant vs. -relevant? [2/8]
@minsuk_chang
@roboticwrestler
@HsseinMzannar
Pretty math sadly isn't always reflected in real human studies :/ hence, one must sometimes make a choice on what to get "working" for a publication (e.g. make the math work or make the human work), and the choice of venue, it seems to me, reflects that choice in priority
In bandit experiments, we find that learning from pragmatic feature preferences outperforms learning either only example-level preferences or pragmatic-augmented features, verifying both elements are important for making use of contextual information contained in descriptions.
[3/n] Our key insight is that changes in human behavior tell us something meaningfully about these implicit preferences, or in other words, that different demonstrations for the same task implies there must be different *task-relevant features* at play.
[4/n] We extend previous work () to, given two contrastive demonstrations, also query language models for these hidden preferences. Our method utilizes off-the-shelf segmentation and captioning models to construct preference-conditioned-abstractions.
In computational experiments, we show that tuning to the "right" complexity supports the greatest finetuning accuracy for a small number of labels. [3/n]
[2/n] LMs have been deployed in robotics as general-purpose task specifiers and planners. But what happens if the user’s utterance does not convey a potentially hidden preference?
For example: the user may prefer Spot to avoid electronics but not clothes on the ground.
[6/n] Importantly, the LM is able to model its own uncertainty when faced with “ambiguous” preferences, and proactively ask the user for their true preferences when queried preferences are high entropy.
We introduce our full data, comprised of 38,400 individual human judgements over 9,600 prediction tasks, as a first-ever large-scale dataset for studying human-AI collaborative decision-making trained, collected, and evaluated on real data.
We contribute a pragmatic approach to data augmentation: we use feature-level preference data to synthesize new examples based on which features are not considered relevant. ✅
[5/n] Policies trained with PLGA are able to produce policies that generalize to new environments, such as Spot successfully avoiding new objects like laptops at test time.
More experiments, discussion, and our user study can be found in the paper:
I had a great time on this paper with awesome collaborators
@dabelcs
@tianminshu
Yuying Sun!!
Reach out if you'd like to chat more!