Rebecca Qian Profile
Rebecca Qian

@rebeccatqian

Followers
663
Following
412
Statuses
171

llm evals @PatronusAI, previously research eng @MetaAI

New York, USA
Joined April 2022
Don't wanna be here? Send us removal request.
@rebeccatqian
Rebecca Qian
2 months
RT @PatronusAI: 1/ Introducing Glider - the smallest model to beat GPT-4o-mini on eval tasks ⚡🚀 - Open source, open weights, open code - E…
0
23
0
@rebeccatqian
Rebecca Qian
2 months
RT @PatronusAI: On the 9th day of Christmas, we are announcing… 360 Degree Human Annotation! 🌲
0
1
0
@rebeccatqian
Rebecca Qian
2 months
PSA if OpenAI is down for you and you can’t run LLM judge evals 😶‍🌫️ you can use Lynx instead @PatronusAI
Tweet media one
0
0
3
@rebeccatqian
Rebecca Qian
2 months
RT @PatronusAI: To celebrate the holiday season, it’s time for 12 Days of Christmas at @PatronusAI 🌲 We’re excited to drop 12 new eval laun…
0
1
0
@rebeccatqian
Rebecca Qian
3 months
RT @EpochAIResearch: 1/10 Today we're launching FrontierMath, a benchmark for evaluating advanced mathematical reasoning in AI. We collabor…
0
406
0
@rebeccatqian
Rebecca Qian
3 months
RT @virattt: FinanceBench is my favorite eval dataset. I reformatted the original into 3 columns: • question • answer • document This mak…
0
5
0
@rebeccatqian
Rebecca Qian
3 months
@jiayq @PatronusAI Thank you @jiayq and @LeptonAI for allowing more people to build vertical AI models 🙂
0
0
1
@rebeccatqian
Rebecca Qian
3 months
RT @jiayq: Congrats to @PatronusAI @rebeccatqian et al for the launch! Custom models are not only more cost effective, but *better* than g…
0
4
0
@rebeccatqian
Rebecca Qian
4 months
RT @gokulr: Congrats @PatronusAI on the APau launch! Great platform for builders to accelerate AI development.
0
3
0
@rebeccatqian
Rebecca Qian
4 months
RT @andriy_mulyar: Tried out the platform for evals a month ago for some internal work - will be posting about it soon! Improving quickly!…
0
4
0
@rebeccatqian
Rebecca Qian
4 months
nothing spookier than AI hallucinations 🎃 happy halloween and try out our API and SDK ->
@PatronusAI
PatronusAI
4 months
1/ Introducing the Patronus API: powerful AI evaluation models to accelerate your AI development 🚀 - 20% more accurate than ragas on hallucination detection - Beats Perspective and Llama Guard on safety tasks by 28% and 11% - Excels in practical domains like finance and customer support Hundreds of elite AI teams across companies like @hospitable, @ExaAILabs, and Algomo use Patronus to do alpha evals ⚡ Try it out today:
1
2
7
@rebeccatqian
Rebecca Qian
4 months
RT @PatronusAI: 1/ Introducing the Patronus API: powerful AI evaluation models to accelerate your AI development 🚀 - 20% more accurate tha…
0
19
0
@rebeccatqian
Rebecca Qian
6 months
Some interesting findings on Llama Guard 👀 this is why we need rigorous independent benchmarking. Shoutout to @getdarshan and @sunitha_selvan for uncovering Llama Guard weaknesses 🦙
@PatronusAI
PatronusAI
6 months
Llama Guard is Off Duty 😲 It’s weak at toxicity detection! We benchmarked popular toxicity datasets spanning languages like Portuguese, Ukrainian, and Turkish, and found that Llama Guard has a very high false negative rate for toxic content! We found that base models like Llama 3.1 do all the heavy lifting on toxicity filtering, and that the joint usage of Llama Guard might be redundant. 🤔 It’s time for a thread 🧵
0
0
7
@rebeccatqian
Rebecca Qian
6 months
RT @PortkeyAI: Potterverse unite! 🪄 Thrilled to share the @PatronusAI's industry-leading evaluators for retrieval accuracy, hallucination…
0
2
0
@rebeccatqian
Rebecca Qian
6 months
RT @PatronusAI: Introducing @PatronusAI + @PortkeyAI 🚀 @PortkeyAI is the leading open source AI gateway. It’s blazing fast and supports ov…
0
6
0
@rebeccatqian
Rebecca Qian
7 months
Check out our webinar with @DbrxMosaicAI on the origin story for #Lynx and a deep dive into hallucinations and automated RAG evals 🔥 @sunitha_selvan @cojennin @dennylee
@DbrxMosaicAI
Databricks Mosaic Research
7 months
Join @rebeccatqian, Sunitha Ravi, @cojennin, and @dennylee for our upcoming webinar on the creation of #Lynx, a state-of-the-art hallucination detection model from @PatronusAI on August 1, 2024 9:05am PT. #LLMs used in Retrieval Augmented Generation (#RAG) systems often produce hallucinations, which result in misinformation to the end user. Lynx outperforms GPT-4o, Claude-3.5-Sonnet, and other LLM-as-a-judge models, and excels in advanced reasoning across complex real-world domains like finance and medicine. We are excited to discuss the training and development of Lynx and share research findings
Tweet media one
0
5
11
@rebeccatqian
Rebecca Qian
7 months
RT @DbrxMosaicAI: Join @rebeccatqian, Sunitha Ravi, @cojennin, and @dennylee for our upcoming webinar on the creation of #Lynx, a state-o…
0
5
0
@rebeccatqian
Rebecca Qian
7 months
@apsdehal Congrats Aman and team!!
0
0
1
@rebeccatqian
Rebecca Qian
7 months
RT @PatronusAI: 1/ Introducing Lynx v1.1: an 8B State-of-the-Art RAG hallucination detection model 🚀 - Beats Claude-3.5-Sonnet on HaluBenc…
0
26
0