Rebecca Qian @rebeccatqian profile

Rebecca Qian

@rebeccatqian

Followers

663

Following

412

Statuses

171

llm evals @PatronusAI, previously research eng @MetaAI

New York, USA

Joined April 2022

Don't wanna be here? Send us removal request.

Rebecca Qian

@rebeccatqian

2 months

RT @PatronusAI: 1/ Introducing Glider - the smallest model to beat GPT-4o-mini on eval tasks ⚡🚀 - Open source, open weights, open code - E…

0

23

0

Rebecca Qian

@rebeccatqian

2 months

RT @PatronusAI: On the 9th day of Christmas, we are announcing… 360 Degree Human Annotation! 🌲

0

1

0

Rebecca Qian

@rebeccatqian

2 months

PSA if OpenAI is down for you and you can’t run LLM judge evals 😶‍🌫️ you can use Lynx instead @PatronusAI

0

3

Rebecca Qian

@rebeccatqian

2 months

RT @PatronusAI: To celebrate the holiday season, it’s time for 12 Days of Christmas at @PatronusAI 🌲 We’re excited to drop 12 new eval laun…

0

1

0

Rebecca Qian

@rebeccatqian

3 months

RT @EpochAIResearch: 1/10 Today we're launching FrontierMath, a benchmark for evaluating advanced mathematical reasoning in AI. We collabor…

0

406

0

Rebecca Qian

@rebeccatqian

3 months

RT @virattt: FinanceBench is my favorite eval dataset. I reformatted the original into 3 columns: • question • answer • document This mak…

0

5

0

Rebecca Qian

@rebeccatqian

3 months

@jiayq @PatronusAI Thank you @jiayq and @LeptonAI for allowing more people to build vertical AI models 🙂

0

1

Rebecca Qian

@rebeccatqian

3 months

RT @jiayq: Congrats to @PatronusAI @rebeccatqian et al for the launch! Custom models are not only more cost effective, but *better* than g…

0

4

0

Rebecca Qian

@rebeccatqian

4 months

RT @gokulr: Congrats @PatronusAI on the APau launch! Great platform for builders to accelerate AI development.

0

3

0

Rebecca Qian

@rebeccatqian

4 months

RT @andriy_mulyar: Tried out the platform for evals a month ago for some internal work - will be posting about it soon! Improving quickly!…

0

4

0

Rebecca Qian

@rebeccatqian

4 months

nothing spookier than AI hallucinations 🎃 happy halloween and try out our API and SDK ->

PatronusAI

@PatronusAI

4 months

1/ Introducing the Patronus API: powerful AI evaluation models to accelerate your AI development 🚀 - 20% more accurate than ragas on hallucination detection - Beats Perspective and Llama Guard on safety tasks by 28% and 11% - Excels in practical domains like finance and customer support Hundreds of elite AI teams across companies like @hospitable, @ExaAILabs, and Algomo use Patronus to do alpha evals ⚡ Try it out today:

1

2

7

Rebecca Qian

@rebeccatqian

4 months

RT @PatronusAI: 1/ Introducing the Patronus API: powerful AI evaluation models to accelerate your AI development 🚀 - 20% more accurate tha…

0

19

0

Rebecca Qian

@rebeccatqian

6 months

Some interesting findings on Llama Guard 👀 this is why we need rigorous independent benchmarking. Shoutout to @getdarshan and @sunitha_selvan for uncovering Llama Guard weaknesses 🦙

PatronusAI

@PatronusAI

6 months

Llama Guard is Off Duty 😲 It’s weak at toxicity detection! We benchmarked popular toxicity datasets spanning languages like Portuguese, Ukrainian, and Turkish, and found that Llama Guard has a very high false negative rate for toxic content! We found that base models like Llama 3.1 do all the heavy lifting on toxicity filtering, and that the joint usage of Llama Guard might be redundant. 🤔 It’s time for a thread 🧵

0

7

Rebecca Qian

@rebeccatqian

6 months

RT @PortkeyAI: Potterverse unite! 🪄 Thrilled to share the @PatronusAI's industry-leading evaluators for retrieval accuracy, hallucination…

0

2

0

Rebecca Qian

@rebeccatqian

6 months

RT @PatronusAI: Introducing @PatronusAI + @PortkeyAI 🚀 @PortkeyAI is the leading open source AI gateway. It’s blazing fast and supports ov…

0

6

0

Rebecca Qian

@rebeccatqian

7 months

Check out our webinar with @DbrxMosaicAI on the origin story for #Lynx and a deep dive into hallucinations and automated RAG evals 🔥 @sunitha_selvan @cojennin @dennylee

Databricks Mosaic Research

@DbrxMosaicAI

7 months

Join @rebeccatqian, Sunitha Ravi, @cojennin, and @dennylee for our upcoming webinar on the creation of #Lynx, a state-of-the-art hallucination detection model from @PatronusAI on August 1, 2024 9:05am PT. #LLMs used in Retrieval Augmented Generation (#RAG) systems often produce hallucinations, which result in misinformation to the end user. Lynx outperforms GPT-4o, Claude-3.5-Sonnet, and other LLM-as-a-judge models, and excels in advanced reasoning across complex real-world domains like finance and medicine. We are excited to discuss the training and development of Lynx and share research findings

0

5

11

Rebecca Qian

@rebeccatqian

7 months

RT @DbrxMosaicAI: Join @rebeccatqian, Sunitha Ravi, @cojennin, and @dennylee for our upcoming webinar on the creation of #Lynx, a state-o…

0

5

0

Rebecca Qian

@rebeccatqian

7 months

@apsdehal Congrats Aman and team!!

0

1

Rebecca Qian

@rebeccatqian

7 months

RT @PatronusAI: 1/ Introducing Lynx v1.1: an 8B State-of-the-Art RAG hallucination detection model 🚀 - Beats Claude-3.5-Sonnet on HaluBenc…

0

26

0