HaoyiQiu Profile Banner
Haoyi Qiu Profile
Haoyi Qiu

@HaoyiQiu

Followers
857
Following
841
Statuses
146

Research intern @SFResearch ☁️ PhD student @UCLANLP 🧸 BS in CS&Math @UMich 〽️ #NLP 🌷

Los Angeles, CA
Joined October 2018
Don't wanna be here? Send us removal request.
@HaoyiQiu
Haoyi Qiu
21 days
Thrilled to share that CASA has been accepted to @naaclmeeting #NAACL2025 (Findings)! 🎉 Can’t wait to see you all in Albuquerque! 🌟 As I wrap up 2024, my first year as a PhD student (started April 2024), I’m overwhelmed with gratitude. This year has been a journey of growth, discovery, and resilience. From publishing 5 papers across NAACL, NeurIPS, ACL, ACM MM, and TKDE to exploring fascinating topics like multimodal hallucination, factuality, safety, and culturally aware agents—every step has been shaped by brilliant collaborators, mentors, and endless support. Here’s to staying humble, working hard, and embracing the opportunities 2025 holds. Let’s keep moving forward! 💪
@HaoyiQiu
Haoyi Qiu
3 months
🌐 Are LLM agents prepared to navigate the rich diversity of cultural and social norms? 🏠 CASA tests them on real-world tasks like online shopping and social discussion forums, revealing that current agents show less than 10% awareness and over 40% norm violations. 🧠 We’re bridging this gap by combining fine-tuning on regional data with strategic prompts to create agents that better understand our world’s diversity. Read the full paper for all insights! 📑 Grateful for the incredible team at Salesforce AI Research @salesforce !
Tweet media one
0
4
38
@HaoyiQiu
Haoyi Qiu
4 days
@PranavVenkit Congratulations 🎉
1
0
1
@HaoyiQiu
Haoyi Qiu
16 days
RT @SFResearch: 🔬Advanced agent systems, RAG evaluation, instruction-following and more. Our team's accepted papers at #NAACL2025 span from…
0
8
0
@HaoyiQiu
Haoyi Qiu
21 days
RT @steeve__huang: Excited to share that CRMArena has been accepted by #NAACL2025 @naaclmeeting. See you in Albuquerque🚡! @SFResearch
0
8
0
@HaoyiQiu
Haoyi Qiu
28 days
RT @webagentlab: 5⃣️ Evaluating Cultural and Social Awareness of LLM Web Agents Haoyi Qiu @HaoyiQiu, Alexander R. Fabbri @alexfabbri4 , Di…
0
2
0
@HaoyiQiu
Haoyi Qiu
1 month
RT @WeijiangLi2: 1/6 📢 Excited to share our new paper on using language models to classify genetic variants! ClinVar-BERT helps prioritize…
0
2
0
@HaoyiQiu
Haoyi Qiu
2 months
RT @VioletNPeng: I’m grateful for the enormous support from the community! It’s an honor to serve, and I’m excited to work hard alongside a…
0
8
0
@HaoyiQiu
Haoyi Qiu
2 months
RT @sarahookr: We have released Global-MMLU-lite 🔥 This is designed to run more efficiently while giving a good estimate of overall perfor…
0
24
0
@HaoyiQiu
Haoyi Qiu
2 months
RT @AlexanderSpangh: ✨✨✨Hello everyone, I’m on the faculty job market this year.✨✨✨ I’m completing my PhD at USC, where I study agentic pla…
0
21
0
@HaoyiQiu
Haoyi Qiu
2 months
RT @srush_nlp: This year, I have an exceptional student on the academic market. Wenting Zhao (@wzhao_nlp) builds systems that reason in na…
0
92
0
@HaoyiQiu
Haoyi Qiu
2 months
RT @tyao923: 🧐How can agents effectively learn skill prompting, planning, and maximizing rewards from large amounts of unlabeled data? 😉Com…
0
7
0
@HaoyiQiu
Haoyi Qiu
2 months
RT @zy27962986: 🚀🚀🚀Want to develop a cutting-edge video generation model towards Sora? Please dive into Apple’s latest recipe and studies f…
0
46
0
@HaoyiQiu
Haoyi Qiu
2 months
RT @Wade_Yin9712: A behavior is safe in country A, but may be unsafe in country B. Check out our #NeurIPS2024 SafeWorld! It evaluates ho…
0
6
0
@HaoyiQiu
Haoyi Qiu
2 months
RT @ZCJW2021: 🔥Thrilled to share our #NeurIPS2024 paper, “JourneyBench⚖️: A Challenging One-Stop Vision-Language Understanding Benchmark of…
0
10
0
@HaoyiQiu
Haoyi Qiu
2 months
RT @steeve__huang: Do LLMs know the 🧑‍🤝‍🧑 cultural and ⚖️ legal safety across our globe? Our #NeurIPS2024 paper 🌍 SafeWorld dives into th…
0
7
0
@HaoyiQiu
Haoyi Qiu
2 months
Just landed in Vancouver 🇨🇦 and I'm beyond excited for my first #NeurIPS and my very first ML conference! 🌱 I'll be presenting our 𝕊𝕒𝕗𝕖𝕎𝕠𝕣𝕝𝕕 paper on Wednesday, Dec 11th at East Exhibit Hall A-C from 11:00 AM to 2:00 PM (#3308). No need to skip lunch—we've got delicious snacks ready for you to enjoy while we chat! I'm also excited to dive into conversations on safety, alignment, cultural analytics, especially for LLMs, LVLMs, and agents. Stop by, grab a snack, and let's connect!
@HaoyiQiu
Haoyi Qiu
2 months
🌍Are LLMs aware of cultural and legal safety in today’s geo-diverse world? 🚀Introducing SafeWorld, our #NeurIPS2024 paper and benchmark assessing LLMs’ understanding of geo-diverse safety, based on cultural norms and policies across 50 countries and 493 regions/races. ⚖️We also propose a multi-dimensional framework for evaluating contextual appropriateness, accuracy, and comprehensiveness, revealing major gaps in current LLMs. 🧨To address this, we train SafeWorldLM using DPO, achieving SOTA performance and a 20% higher global human evaluator rating in helpfulness and harmfulness over competing models, including GPT-4o. 🔗Paper: 💻 GitHub: 🫶🏻This is a joint leading effort with @Wade_Yin9712. Also many thanks to the amazing team @steeve__huang @kaiwei_chang, and @VioletNPeng for their hard work. Check out more details and results we conclude from our paper in the thread below. 🧵
0
6
54
@HaoyiQiu
Haoyi Qiu
2 months
RT @ChujieZheng: Thrilled to introduce ProcessBench, our benchmark for measuring the ability to identify process errors in mathematical rea…
0
51
0
@HaoyiQiu
Haoyi Qiu
2 months
(6/n) This research was made possible through amazing collaborations between @uclanlp and @SFResearch. 🫶🏻This is a joint leading effort with @Wade_Yin9712. Also many thanks to the amazing team @steeve__huang, @kaiwei_chang, and @VioletNPeng for their hard work. 🌟 (n=6)
0
0
3
@HaoyiQiu
Haoyi Qiu
2 months
(5/n) We show that our SafeWorldLM model significantly outperforms competitors, including GPT-4o, across all evaluation dimensions!
Tweet media one
0
0
4
@HaoyiQiu
Haoyi Qiu
2 months
(4/n) We've also developed SafeWorldLM, a model trained for outstanding geo-diverse safety alignment, outperforming even top proprietary models like GPT-4o by wide margins on all safety dimensions.
Tweet media one
0
0
4