AdaptiveML Profile Banner
Adaptive ML Profile
Adaptive ML

@AdaptiveML

Followers
126
Following
12
Statuses
17

AI, Tuned to Production. Continuously evaluate and adapt models with production feedback to surpass frontier performance—from your cloud or ours.

Paris & New York
Joined October 2023
Don't wanna be here? Send us removal request.
@AdaptiveML
Adaptive ML
3 months
The most exciting perspective of RL for post-training is the ability to craft tailor-made rewards to achieve specific goals; more to follow 🚀. 📷 Blogpost: � 🙏 Thanks to @colinraffel & @aahmadian_ for the cooperation on this project!
0
0
1
@AdaptiveML
Adaptive ML
3 months
Our latest blog From Zero to PPO: Understanding the Path to Helpful AI Models builds an intuitive understanding of PPO and its differentiation from other tuning techniques:
0
0
0
@AdaptiveML
Adaptive ML
3 months
📣 At #VDS2024, Adaptive ML CTO @BaptistePannier joined Ahmed Menshawy of @Mastercard, @margaridagsl of @poolsideai, and @NeemaBal of NEEMA AI for a discussion on the challenges and rewards of getting GenAI into production. 📣 🎉 Thanks to @VDS_event for hosting! 🎉
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
1
7
@AdaptiveML
Adaptive ML
3 months
A more effective training process would be for the LLM to suggest completions and learn through the evaluation of these completions instead. See how in our blog From Zero to PPO: Understanding the Path to Helpful AI Models -
0
0
0
@AdaptiveML
Adaptive ML
3 months
We connect the dots between rejection sampling, REINFORCE, and Advantage Actor Critic, drawing a deeper understanding of how to tune LLMs to deliver helpful, harmless, and honest answers. Read our blog:
0
0
0
@AdaptiveML
Adaptive ML
3 months
Taken at face value, PPO is puzzling. It involves four different versions of the model interacting together (policy, value, reward, and reference), and is driven by an intricate loss function. In our blog, we build-up to PPO, starting from supervised fine-tuning (SFT).
0
0
0
@AdaptiveML
Adaptive ML
3 months
Helpfulness is instilled in LLMs as a result of extensive post-training. One approach in particular has been exceptionally successful: Reinforcement Learning from Human Feedback (RLHF). One of the engines of RLHF is Proximal Policy Optimization (PPO).
0
0
0
@AdaptiveML
Adaptive ML
5 months
@lae_teo Welcome to the team!
0
0
0