![Rebecca Qian Profile](https://pbs.twimg.com/profile_images/1702368465284108288/HnEpKlhB_x96.jpg)
Rebecca Qian
@rebeccatqian
Followers
663
Following
412
Statuses
171
llm evals @PatronusAI, previously research eng @MetaAI
New York, USA
Joined April 2022
RT @PatronusAI: 1/ Introducing Glider - the smallest model to beat GPT-4o-mini on eval tasks ⚡🚀 - Open source, open weights, open code - E…
0
23
0
RT @PatronusAI: On the 9th day of Christmas, we are announcing… 360 Degree Human Annotation! 🌲
0
1
0
PSA if OpenAI is down for you and you can’t run LLM judge evals 😶🌫️ you can use Lynx instead @PatronusAI
0
0
3
RT @PatronusAI: To celebrate the holiday season, it’s time for 12 Days of Christmas at @PatronusAI 🌲 We’re excited to drop 12 new eval laun…
0
1
0
RT @EpochAIResearch: 1/10 Today we're launching FrontierMath, a benchmark for evaluating advanced mathematical reasoning in AI. We collabor…
0
406
0
RT @virattt: FinanceBench is my favorite eval dataset. I reformatted the original into 3 columns: • question • answer • document This mak…
0
5
0
@jiayq @PatronusAI Thank you @jiayq and @LeptonAI for allowing more people to build vertical AI models 🙂
0
0
1
RT @jiayq: Congrats to @PatronusAI @rebeccatqian et al for the launch! Custom models are not only more cost effective, but *better* than g…
0
4
0
RT @gokulr: Congrats @PatronusAI on the APau launch! Great platform for builders to accelerate AI development.
0
3
0
RT @andriy_mulyar: Tried out the platform for evals a month ago for some internal work - will be posting about it soon! Improving quickly!…
0
4
0
nothing spookier than AI hallucinations 🎃 happy halloween and try out our API and SDK ->
1/ Introducing the Patronus API: powerful AI evaluation models to accelerate your AI development 🚀 - 20% more accurate than ragas on hallucination detection - Beats Perspective and Llama Guard on safety tasks by 28% and 11% - Excels in practical domains like finance and customer support Hundreds of elite AI teams across companies like @hospitable, @ExaAILabs, and Algomo use Patronus to do alpha evals ⚡ Try it out today:
1
2
7
RT @PatronusAI: 1/ Introducing the Patronus API: powerful AI evaluation models to accelerate your AI development 🚀 - 20% more accurate tha…
0
19
0
Some interesting findings on Llama Guard 👀 this is why we need rigorous independent benchmarking. Shoutout to @getdarshan and @sunitha_selvan for uncovering Llama Guard weaknesses 🦙
Llama Guard is Off Duty 😲 It’s weak at toxicity detection! We benchmarked popular toxicity datasets spanning languages like Portuguese, Ukrainian, and Turkish, and found that Llama Guard has a very high false negative rate for toxic content! We found that base models like Llama 3.1 do all the heavy lifting on toxicity filtering, and that the joint usage of Llama Guard might be redundant. 🤔 It’s time for a thread 🧵
0
0
7
RT @PortkeyAI: Potterverse unite! 🪄 Thrilled to share the @PatronusAI's industry-leading evaluators for retrieval accuracy, hallucination…
0
2
0
RT @PatronusAI: Introducing @PatronusAI + @PortkeyAI 🚀 @PortkeyAI is the leading open source AI gateway. It’s blazing fast and supports ov…
0
6
0
Check out our webinar with @DbrxMosaicAI on the origin story for #Lynx and a deep dive into hallucinations and automated RAG evals 🔥 @sunitha_selvan @cojennin @dennylee
Join @rebeccatqian, Sunitha Ravi, @cojennin, and @dennylee for our upcoming webinar on the creation of #Lynx, a state-of-the-art hallucination detection model from @PatronusAI on August 1, 2024 9:05am PT. #LLMs used in Retrieval Augmented Generation (#RAG) systems often produce hallucinations, which result in misinformation to the end user. Lynx outperforms GPT-4o, Claude-3.5-Sonnet, and other LLM-as-a-judge models, and excels in advanced reasoning across complex real-world domains like finance and medicine. We are excited to discuss the training and development of Lynx and share research findings
0
5
11
RT @DbrxMosaicAI: Join @rebeccatqian, Sunitha Ravi, @cojennin, and @dennylee for our upcoming webinar on the creation of #Lynx, a state-o…
0
5
0
RT @PatronusAI: 1/ Introducing Lynx v1.1: an 8B State-of-the-Art RAG hallucination detection model 🚀 - Beats Claude-3.5-Sonnet on HaluBenc…
0
26
0