![Kaiser Sun Profile](https://pbs.twimg.com/profile_images/1783303425972674560/OnFYnphf_x96.jpg)
Kaiser Sun
@KaiserWhoLearns
Followers
962
Following
1K
Statuses
293
Ph.D. student at @jhuclsp, human LM that hallucinates. Formerly @MetaAI, @uwnlp, and @AWS they/them🏳️🌈
My fantasea
Joined May 2021
Training a large language model? Pre-train then fine-tune! But how does pre-training affect downstream fine-tuning performance? What is learned during pre-training vs. fine-tuning? Here are some results. (🧵below) #NLProc
6
30
130
RT @Antonin_Poche: 🚀 Thrilled to share our new paper (the first of my PhD)! How can we compare concept-based #XAI methods in #NLProc? Co…
0
4
0
RT @SonglinYang4: I've created slides for those curious about the recent rapid progress in linear attention: from linear attention to Light…
0
172
0
RT @SirrahChan: Here's my attempt at visualizing the training pipeline for DeepSeek-R1(-Zero) and the distillation to smaller models. Not…
0
244
0
@LChoshen Re the “anger” comment: while I am not angry about this, I can understand those who find it depersonalizing. In my original tweet, I think this reflects a good example of a stereotype to be included in a dataset that we’re working on, so I was asking about the license question.
1
0
0
@LChoshen Morally speaking, people are responsible for being aware of how their words might contribute to systemic bias, even unintentionally. I truly want to believe that the speaker might not have malicious intent, but their words still caused discomfort among the audience.
0
0
2
@LChoshen I would say that nationality here is not a needed context, as it does not encapsulate the school but the student. More context from the talk: this is the only page that mentions the student’s nationality, and the exact school is not mentioned.
0
0
2
RT @repro_challenge: We are excited to announce MLRC 2025, the eighth iteration of MLRC, which will also be its first in-person edition @ P…
0
4
0
If someone asked me why do research with Mark, this is an answer🔥 Swords are essential items to defend thesis. We need to be well-equipped.
New PhD tradition at @JHUCompSci @jhuclsp ! 🎓 We now knight our graduating PhD students and present them with a sword. ⚔️🤺 Congratulations @pocaguirre
1
5
31
@qi2peng2 What if a and b are not from an Abelian group? Ah, perhaps I am out of the field to make this comment.
0
0
3
RT @pocaguirre: PhDone! 🎉 Thanks to everyone who was there along the way, family, friends, colleagues and advisors! Yes, I am a doctor now.…
0
8
0
RT @AkariAsai: 🚨 I’m on the job market this year! 🚨 I’m completing my @uwcse Ph.D. (2025), where I identify and tackle key LLM limitations…
0
117
0
@jbhuang0604 I almost thought this is a screenshot of my calendar before seeing “kids to/from school, which are usually“taking lectures given by my cats” for me.
1
0
1
RT @soldni: OLMo 2 is out 🥳 7B and 13B trained on 5T tokens, and meticulousy instruction tuned using Tulu 3 recipe. Simply the best fully…
0
28
0
RT @AkariAsai: 1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚 @uwnlp @allen_ai With open m…
0
267
0
@jasmijnbastings @maria_antoniak More starter packs: JHU CLSP: Yale Digital Ethics Center: Oxford Internet Institute: List compiled by @maosbot:
0
0
2