_christinabaek Profile Banner
Christina Baek Profile
Christina Baek

@_christinabaek

Followers
1K
Following
357
Statuses
81

PhD student @mldcmu | Past: intern @GoogleAI | Robust ML

Joined June 2021
Don't wanna be here? Send us removal request.
@_christinabaek
Christina Baek
4 months
Chatbots are often augmented w/ new facts by context from the user or retriever. Models must adapt instead of hallucinating outdated facts. In this work w/@goyalsachin007, @zicokolter, @AdtRaghunathan, we show that instruction tuning fails to reliably improve this behavior! [1/n]
1
22
106
@_christinabaek
Christina Baek
2 months
@saprmarks @RichardMCNgo Agree results are cool, but the problem w/ "alignment faking" is that it's explained in a really anthropomorphic way in the blog/paper. Writing that blurs metaphors to human behavior with the phenomenon (i.e. bad reasoning w/ mixed preferences) can misinform the lay reader.
Tweet media one
0
0
0
@_christinabaek
Christina Baek
2 months
RT @RichardMCNgo: My main problem is with the phrase “alignment faking”, which is used extensively throughout the paper. This is a value-la…
0
16
0
@_christinabaek
Christina Baek
2 months
RT @AlexRobey23: After rejections at ICLR, ICML, and NeurIPS, I'm happy to report that "Jailbreaking Black Box LLMs in Twenty Queries" (i.e…
0
16
0
@_christinabaek
Christina Baek
2 months
RT @RuntianZhai: While many are condemning the #NeurIPS speaker, let me suggest something actionable: Stop putting ICML deadline in the sam…
0
1
0
@_christinabaek
Christina Baek
2 months
2. In this work w/ Eungyeup and Mingjie, we found that while there’s still a ID vs OOD performance gap after test-time adaptation, applying TTA dramatically improves on-the-line trends! Details: Wednesday (Dec 11) East Exhibit Hall A-C #4501
@EungyeupK
Eungyeup Kim
2 months
Under distribution shifts, how can we evaluate model performance in OOD without labels? In our #NeurIPS2024 paper, we show how test-time adaptation (TTA) strengthens the fascinating "accuracy/agreement-on-the-line" trend—improving OOD predictability without labels! 🧵👇
Tweet media one
0
0
2
@_christinabaek
Christina Baek
2 months
RT @danielkty96: Can Agreement-on-the-Line (AGL) predict the OOD performance of Foundation Models without labels? In our work, we show tha…
0
6
0
@_christinabaek
Christina Baek
2 months
This is a great work by @EungyeupK. It turns out that applying test-time adaptation improves Accuracy+Agreement-on-the-Line trends even in datasets like Camelyon17! A simple explanation: TTA collapses distribution shifts into “scale shifts” in the penultimate representation space
@EungyeupK
Eungyeup Kim
2 months
Under distribution shifts, how can we evaluate model performance in OOD without labels? In our #NeurIPS2024 paper, we show how test-time adaptation (TTA) strengthens the fascinating "accuracy/agreement-on-the-line" trend—improving OOD predictability without labels! 🧵👇
Tweet media one
0
0
16
@_christinabaek
Christina Baek
2 months
RT @EungyeupK: Under distribution shifts, how can we evaluate model performance in OOD without labels? In our #NeurIPS2024 paper, we show…
0
23
0
@_christinabaek
Christina Baek
2 months
RT @JunhongShen1: 🧵1/ Introducing ScribeAgent 🤖! Using extensive 𝗿𝗲𝗮𝗹-𝘄𝗼𝗿𝗹𝗱 𝘄𝗲𝗯 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄 𝗱𝗮𝘁𝗮, we at @mldcmu and @ScribeHow have adapted 𝗴𝗲…
0
30
0
@_christinabaek
Christina Baek
2 months
RT @dylanjsam: Contrastive VLMs (CLIP) lack the structure of text embeddings, like satisfying analogies via arithmetic (king - man = queen)…
0
48
0
@_christinabaek
Christina Baek
3 months
RT @steph_milani: 🇨🇦 Hi! I’m attending my last @NeurIPSConf as a PhD student, presenting Patient-Ψ at a few workshops. I'm on the job mark…
0
31
0
@_christinabaek
Christina Baek
3 months
RT @pratyushmaini: 1/In our new blog post, we express a very nuanced point. Amidst the chaos around eval contamination, @scale_AI came up…
0
10
0
@_christinabaek
Christina Baek
3 months
RT @YiMaTweets: I am recruiting PhD students as HKU now. Welcome top students who are passionate about machine intelligence to apply -- bet…
0
44
0
@_christinabaek
Christina Baek
3 months
RT @katie_kang_: LLMs excel at fitting finetuning data, but are they learning to reason or just parroting🦜? We found a way to probe a mode…
0
118
0
@_christinabaek
Christina Baek
3 months
RT @Tanishq97836660: [1/7] New paper alert! Heard about the BitNet hype or that Llama-3 is harder to quantize? Our new work studies both! W…
0
161
0
@_christinabaek
Christina Baek
3 months
RT @FanPu_Zeng: This recent paper from @deepcohen @alex_damian_ and friends is a gem - proposing a technique called "central flow" to analy…
0
23
0
@_christinabaek
Christina Baek
3 months
RT @ZhengyangGeng: Thrilled to announce Score Implicit Matching (SIM) for building 1-step deep generative models! The key idea is that, al…
0
12
0
@_christinabaek
Christina Baek
3 months
RT @goyalsachin007: Inference with VLMs is costly, thanks to 500+ image tokens. So… should you use a smaller model or run a bigger model on…
0
59
0