rungalileo Profile Banner
๐Ÿ”ญ Galileo Profile
๐Ÿ”ญ Galileo

@rungalileo

Followers
782
Following
174
Statuses
488

Generative AI Evaluation, Experimentation, and Observability Platform

SF & NYC
Joined June 2021
Don't wanna be here? Send us removal request.
@rungalileo
๐Ÿ”ญ Galileo
15 hours
๐Ÿ“Š Our Agent Leaderboard is ๐—น๐—ถ๐˜ƒ๐—ฒ! We built a comprehensive benchmark of which LLMs work best for AI Agents ๐Ÿ‘€ After evaluating 17 leading LLMs across 14 diverse datasets, we're excited to share our findings about which models truly excel at tool-callingโ€”and are ready to power AI agents to solve ๐˜ณ๐˜ฆ๐˜ข๐˜ญ-๐˜ธ๐˜ฐ๐˜ณ๐˜ญ๐˜ฅ ๐˜ฑ๐˜ณ๐˜ฐ๐˜ฃ๐˜ญ๐˜ฆ๐˜ฎ๐˜ด effectively. Key discoveries: ๐Ÿ† @Google's ๐—š๐—ฒ๐—บ๐—ถ๐—ป๐—ถ-๐Ÿฎ.๐Ÿฌ-๐—ณ๐—น๐—ฎ๐˜€๐—ต ๐—ฑ๐—ผ๐—บ๐—ถ๐—ป๐—ฎ๐˜๐—ฒ๐˜€ with a 0.938 score at remarkably low cost ๐Ÿ’ธ The top 3 models span a 10๐˜น ๐˜ฑ๐˜ณ๐˜ช๐˜ค๐˜ฆ ๐˜ฅ๐˜ช๐˜ง๐˜ง๐˜ฆ๐˜ณ๐˜ฆ๐˜ฏ๐˜ค๐˜ฆ with only 4% performance gap: ๐˜€๐—ผ๐—บ๐—ฒ ๐—ผ๐—ณ ๐˜†๐—ผ๐˜‚ ๐—ฎ๐—ฟ๐—ฒ ๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐—ฝ๐—ฎ๐˜†๐—ถ๐—ป๐—ด! ๐Ÿ›  @MistralAI's Mistral-small-2501 ๐—น๐—ฒ๐—ฎ๐—ฑ๐˜€ ๐—ผ๐—ฝ๐—ฒ๐—ป-๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ options, matching GPT-4o-mini at 0.832 โŒ ๐—ฆ๐˜‚๐—ฟ๐—ฝ๐—ฟ๐—ถ๐˜€๐—ฒ ๐—ณ๐—ฎ๐—ถ๐—น๐˜‚๐—ฟ๐—ฒ: @deepseek_ai V3 and R1 didn't make the rankings due to limited function calling supportโ€”making them ineffective for enabling AI agents to leverage tools Get more insights, dive into the full analysis and explore the interactive leaderboard on @huggingface: Which LLM are you using for your AI agents? Are you getting the best value for your spend? ๐Ÿค”
Tweet media one
2
9
27
@rungalileo
๐Ÿ”ญ Galileo
7 hours
@ConorBronsdon @OfficialLoganK Plus, check out the agent leaderboard:
0
0
1
@rungalileo
๐Ÿ”ญ Galileo
13 hours
RT @ConorBronsdon: The data is clear: @GoogleAI's Gemini-2.0-flash dominates AI agent capabilities with a 0.938 score on @rungalileo's newโ€ฆ
0
2
0
@rungalileo
๐Ÿ”ญ Galileo
15 hours
RT @nlpguy_: ๐Ÿ’ฅ Launching the ๐—”๐—ด๐—ฒ๐—ป๐˜ ๐—Ÿ๐—ฒ๐—ฎ๐—ฑ๐—ฒ๐—ฟ๐—ฏ๐—ผ๐—ฎ๐—ฟ๐—ฑ on @huggingface! Our ranking of top open and closed-source models revealed some surprisinโ€ฆ
0
8
0
@rungalileo
๐Ÿ”ญ Galileo
15 hours
Kudos to @nlpguy_ for driving this effort - you can dig into the details below ๐Ÿ‘‡ Leaderboard on Hugging Face: GitHub: Explainer blog: Dataset:
0
0
2
@rungalileo
๐Ÿ”ญ Galileo
2 days
โœจ ๐Ÿ†• CLHF (Continuous Learning w/ Human Feedback) available on Galileo.
0
0
1
@rungalileo
๐Ÿ”ญ Galileo
5 days
@itsaydrian @erinmikail Fantastic event! Thank you for putting this together both of you ๐Ÿค
0
0
2
@rungalileo
๐Ÿ”ญ Galileo
6 days
@nickytonline @Ace_KYD @erinmikail Hope to catch you at a future event!
0
0
1
@rungalileo
๐Ÿ”ญ Galileo
6 days
@Ace_KYD @erinmikail Thanks for coming out @Ace_KYD!!! We're glad you had fun! We're big fans of your work at @TBD54566975
0
0
0
@rungalileo
๐Ÿ”ญ Galileo
6 days
RT @Ace_KYD: Having a blast here at the ChatGPT Roulette hosted by @erinmikail โœจ
Tweet media one
Tweet media two
0
2
0
@rungalileo
๐Ÿ”ญ Galileo
8 days
We can't wait for the @aiDotEngineer Summit! Excited to be sponsoring the event - swing by our booth, or join us for happy hour on us after the event:
@aiDotEngineer
AI Engineer
10 days
We're excited to announce our expo partners for Summit! Come meet the founders and leading engineers at these companies leading support & innovation in the world of AI Engineering: @solana @Sourcegraph @rungalileo @basetenco @HasuraHQ @datadoghq @windsurf_ai @Get_Writer @weights_biases @ellipsis_dev @elevenlabsio @gitpod @vellum_ai @LangbaseInc @PortkeyAI @daytonaio @trydaily
0
0
4
@rungalileo
๐Ÿ”ญ Galileo
8 days
@aiDotEngineer @solana @Sourcegraph @basetenco @HasuraHQ @datadoghq @windsurf_ai @Get_Writer We can't wait for the AI Engineering Summit! Excited to be sponsoring the event - swing by our booth, or join us for happy hour on us after the event:
0
0
2
@rungalileo
๐Ÿ”ญ Galileo
8 days
๐Ÿ” Want to build your own AI research agent with @OpenAI's Deep Research? We'll show you how. @nlpguy_ built a comprehensive guide to building, deploying, and evaluating your own research agent. We break down: โšก How to construct a Deep Research agent using o3-mini and 4o ๐Ÿ“ Step-by-step implementation with real code examples ๐Ÿ“Š Advanced evaluation techniques for measuring agent performance ๐Ÿ”ฌ Practical insights on improving agent reliability Here's how:
Tweet media one
0
1
1
@rungalileo
๐Ÿ”ญ Galileo
8 days
RT @swyx:
0
1
0
@rungalileo
๐Ÿ”ญ Galileo
9 days
RSVP to developer drinkup at @aiDotEngineer
0
0
0
@rungalileo
๐Ÿ”ญ Galileo
10 days
RT @TheTuringPost: Mastering AI Agents, a free 100-pages eBook from @rungalileo ๐Ÿ‘‡ This guide covers: - Agent types - Their applications -โ€ฆ
0
6
0
@rungalileo
๐Ÿ”ญ Galileo
10 days
See you all at the @aiDotEngineer Summit in NYC!
@aiDotEngineer
AI Engineer
10 days
We're excited to announce our expo partners for Summit! Come meet the founders and leading engineers at these companies leading support & innovation in the world of AI Engineering: @solana @Sourcegraph @rungalileo @basetenco @HasuraHQ @datadoghq @windsurf_ai @Get_Writer @weights_biases @ellipsis_dev @elevenlabsio @gitpod @vellum_ai @LangbaseInc @PortkeyAI @daytonaio @trydaily
0
1
2