![๐ญ Galileo Profile](https://pbs.twimg.com/profile_images/1600695781849059328/7OYpcgks_x96.jpg)
๐ญ Galileo
@rungalileo
Followers
782
Following
174
Statuses
488
Generative AI Evaluation, Experimentation, and Observability Platform
SF & NYC
Joined June 2021
๐ Our Agent Leaderboard is ๐น๐ถ๐๐ฒ! We built a comprehensive benchmark of which LLMs work best for AI Agents ๐ After evaluating 17 leading LLMs across 14 diverse datasets, we're excited to share our findings about which models truly excel at tool-callingโand are ready to power AI agents to solve ๐ณ๐ฆ๐ข๐ญ-๐ธ๐ฐ๐ณ๐ญ๐ฅ ๐ฑ๐ณ๐ฐ๐ฃ๐ญ๐ฆ๐ฎ๐ด effectively. Key discoveries: ๐ @Google's ๐๐ฒ๐บ๐ถ๐ป๐ถ-๐ฎ.๐ฌ-๐ณ๐น๐ฎ๐๐ต ๐ฑ๐ผ๐บ๐ถ๐ป๐ฎ๐๐ฒ๐ with a 0.938 score at remarkably low cost ๐ธ The top 3 models span a 10๐น ๐ฑ๐ณ๐ช๐ค๐ฆ ๐ฅ๐ช๐ง๐ง๐ฆ๐ณ๐ฆ๐ฏ๐ค๐ฆ with only 4% performance gap: ๐๐ผ๐บ๐ฒ ๐ผ๐ณ ๐๐ผ๐ ๐ฎ๐ฟ๐ฒ ๐ผ๐๐ฒ๐ฟ๐ฝ๐ฎ๐๐ถ๐ป๐ด! ๐ @MistralAI's Mistral-small-2501 ๐น๐ฒ๐ฎ๐ฑ๐ ๐ผ๐ฝ๐ฒ๐ป-๐๐ผ๐๐ฟ๐ฐ๐ฒ options, matching GPT-4o-mini at 0.832 โ ๐ฆ๐๐ฟ๐ฝ๐ฟ๐ถ๐๐ฒ ๐ณ๐ฎ๐ถ๐น๐๐ฟ๐ฒ: @deepseek_ai V3 and R1 didn't make the rankings due to limited function calling supportโmaking them ineffective for enabling AI agents to leverage tools Get more insights, dive into the full analysis and explore the interactive leaderboard on @huggingface: Which LLM are you using for your AI agents? Are you getting the best value for your spend? ๐ค
2
9
27
RT @ConorBronsdon: The data is clear: @GoogleAI's Gemini-2.0-flash dominates AI agent capabilities with a 0.938 score on @rungalileo's newโฆ
0
2
0
RT @nlpguy_: ๐ฅ Launching the ๐๐ด๐ฒ๐ป๐ ๐๐ฒ๐ฎ๐ฑ๐ฒ๐ฟ๐ฏ๐ผ๐ฎ๐ฟ๐ฑ on @huggingface! Our ranking of top open and closed-source models revealed some surprisinโฆ
0
8
0
Kudos to @nlpguy_ for driving this effort - you can dig into the details below ๐ Leaderboard on Hugging Face: GitHub: Explainer blog: Dataset:
0
0
2
@Ace_KYD @erinmikail Thanks for coming out @Ace_KYD!!! We're glad you had fun! We're big fans of your work at @TBD54566975
0
0
0
We can't wait for the @aiDotEngineer Summit! Excited to be sponsoring the event - swing by our booth, or join us for happy hour on us after the event:
We're excited to announce our expo partners for Summit! Come meet the founders and leading engineers at these companies leading support & innovation in the world of AI Engineering: @solana
@Sourcegraph
@rungalileo
@basetenco
@HasuraHQ
@datadoghq
@windsurf_ai
@Get_Writer
@weights_biases
@ellipsis_dev
@elevenlabsio
@gitpod
@vellum_ai
@LangbaseInc
@PortkeyAI
@daytonaio
@trydaily
0
0
4
@aiDotEngineer @solana @Sourcegraph @basetenco @HasuraHQ @datadoghq @windsurf_ai @Get_Writer We can't wait for the AI Engineering Summit! Excited to be sponsoring the event - swing by our booth, or join us for happy hour on us after the event:
0
0
2
๐ Want to build your own AI research agent with @OpenAI's Deep Research? We'll show you how. @nlpguy_ built a comprehensive guide to building, deploying, and evaluating your own research agent. We break down: โก How to construct a Deep Research agent using o3-mini and 4o ๐ Step-by-step implementation with real code examples ๐ Advanced evaluation techniques for measuring agent performance ๐ฌ Practical insights on improving agent reliability Here's how:
0
1
1
RT @TheTuringPost: Mastering AI Agents, a free 100-pages eBook from @rungalileo ๐ This guide covers: - Agent types - Their applications -โฆ
0
6
0
See you all at the @aiDotEngineer Summit in NYC!
We're excited to announce our expo partners for Summit! Come meet the founders and leading engineers at these companies leading support & innovation in the world of AI Engineering: @solana
@Sourcegraph
@rungalileo
@basetenco
@HasuraHQ
@datadoghq
@windsurf_ai
@Get_Writer
@weights_biases
@ellipsis_dev
@elevenlabsio
@gitpod
@vellum_ai
@LangbaseInc
@PortkeyAI
@daytonaio
@trydaily
0
1
2