🔭 Galileo @rungalileo profile

🔭 Galileo

@rungalileo

Followers

782

Following

174

Statuses

488

Generative AI Evaluation, Experimentation, and Observability Platform

SF & NYC

Joined June 2021

Don't wanna be here? Send us removal request.

🔭 Galileo

@rungalileo

15 hours

📊 Our Agent Leaderboard is 𝗹𝗶𝘃𝗲! We built a comprehensive benchmark of which LLMs work best for AI Agents 👀 After evaluating 17 leading LLMs across 14 diverse datasets, we're excited to share our findings about which models truly excel at tool-calling—and are ready to power AI agents to solve 𝘳𝘦𝘢𝘭-𝘸𝘰𝘳𝘭𝘥 𝘱𝘳𝘰𝘣𝘭𝘦𝘮𝘴 effectively. Key discoveries: 🏆 @Google's 𝗚𝗲𝗺𝗶𝗻𝗶-𝟮.𝟬-𝗳𝗹𝗮𝘀𝗵 𝗱𝗼𝗺𝗶𝗻𝗮𝘁𝗲𝘀 with a 0.938 score at remarkably low cost 💸 The top 3 models span a 10𝘹 𝘱𝘳𝘪𝘤𝘦 𝘥𝘪𝘧𝘧𝘦𝘳𝘦𝘯𝘤𝘦 with only 4% performance gap: 𝘀𝗼𝗺𝗲 𝗼𝗳 𝘆𝗼𝘂 𝗮𝗿𝗲 𝗼𝘃𝗲𝗿𝗽𝗮𝘆𝗶𝗻𝗴! 🛠 @MistralAI's Mistral-small-2501 𝗹𝗲𝗮𝗱𝘀 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 options, matching GPT-4o-mini at 0.832 ❌ 𝗦𝘂𝗿𝗽𝗿𝗶𝘀𝗲 𝗳𝗮𝗶𝗹𝘂𝗿𝗲: @deepseek_ai V3 and R1 didn't make the rankings due to limited function calling support—making them ineffective for enabling AI agents to leverage tools Get more insights, dive into the full analysis and explore the interactive leaderboard on @huggingface: Which LLM are you using for your AI agents? Are you getting the best value for your spend? 🤔

2

9

27

🔭 Galileo

@rungalileo

7 hours

@ConorBronsdon @OfficialLoganK Plus, check out the agent leaderboard:

0

1

🔭 Galileo

@rungalileo

13 hours

RT @ConorBronsdon: The data is clear: @GoogleAI's Gemini-2.0-flash dominates AI agent capabilities with a 0.938 score on @rungalileo's new…

0

2

0

🔭 Galileo

@rungalileo

15 hours

RT @nlpguy_: 💥 Launching the 𝗔𝗴𝗲𝗻𝘁 𝗟𝗲𝗮𝗱𝗲𝗿𝗯𝗼𝗮𝗿𝗱 on @huggingface! Our ranking of top open and closed-source models revealed some surprisin…

0

8

0

🔭 Galileo

@rungalileo

15 hours

Kudos to @nlpguy_ for driving this effort - you can dig into the details below 👇 Leaderboard on Hugging Face: GitHub: Explainer blog: Dataset:

0

2

🔭 Galileo

@rungalileo

2 days

✨ 🆕 CLHF (Continuous Learning w/ Human Feedback) available on Galileo.

0

1

🔭 Galileo

@rungalileo

5 days

@itsaydrian @erinmikail Fantastic event! Thank you for putting this together both of you 🤝

0

2

🔭 Galileo

@rungalileo

6 days

@nickytonline @Ace_KYD @erinmikail Hope to catch you at a future event!

0

1

🔭 Galileo

@rungalileo

6 days

@Ace_KYD @erinmikail Thanks for coming out @Ace_KYD!!! We're glad you had fun! We're big fans of your work at @TBD54566975

0

🔭 Galileo

@rungalileo

6 days

RT @Ace_KYD: Having a blast here at the ChatGPT Roulette hosted by @erinmikail ✨

0

2

0

🔭 Galileo

@rungalileo

8 days

We can't wait for the @aiDotEngineer Summit! Excited to be sponsoring the event - swing by our booth, or join us for happy hour on us after the event:

AI Engineer

@aiDotEngineer

10 days

We're excited to announce our expo partners for Summit! Come meet the founders and leading engineers at these companies leading support & innovation in the world of AI Engineering: @solana @Sourcegraph @rungalileo @basetenco @HasuraHQ @datadoghq @windsurf_ai @Get_Writer @weights_biases @ellipsis_dev @elevenlabsio @gitpod @vellum_ai @LangbaseInc @PortkeyAI @daytonaio @trydaily

0

4

🔭 Galileo

@rungalileo

8 days

@aiDotEngineer @solana @Sourcegraph @basetenco @HasuraHQ @datadoghq @windsurf_ai @Get_Writer We can't wait for the AI Engineering Summit! Excited to be sponsoring the event - swing by our booth, or join us for happy hour on us after the event:

0

2

🔭 Galileo

@rungalileo

8 days

🔍 Want to build your own AI research agent with @OpenAI's Deep Research? We'll show you how. @nlpguy_ built a comprehensive guide to building, deploying, and evaluating your own research agent. We break down: ⚡ How to construct a Deep Research agent using o3-mini and 4o 📝 Step-by-step implementation with real code examples 📊 Advanced evaluation techniques for measuring agent performance 🔬 Practical insights on improving agent reliability Here's how:

0

1

🔭 Galileo

@rungalileo

8 days

RT @swyx:

0

1

0

🔭 Galileo

@rungalileo

9 days

RSVP to developer drinkup at @aiDotEngineer

0

🔭 Galileo

@rungalileo

10 days

RT @TheTuringPost: Mastering AI Agents, a free 100-pages eBook from @rungalileo 👇 This guide covers: - Agent types - Their applications -…

0

6

0

🔭 Galileo

@rungalileo

10 days

See you all at the @aiDotEngineer Summit in NYC!

AI Engineer

@aiDotEngineer

10 days

We're excited to announce our expo partners for Summit! Come meet the founders and leading engineers at these companies leading support & innovation in the world of AI Engineering: @solana @Sourcegraph @rungalileo @basetenco @HasuraHQ @datadoghq @windsurf_ai @Get_Writer @weights_biases @ellipsis_dev @elevenlabsio @gitpod @vellum_ai @LangbaseInc @PortkeyAI @daytonaio @trydaily

0

1

2