Blaze (Balázs Galambosi) @gblazex profile

Blaze (Balázs Galambosi)

@gblazex

Followers

1K

Following

4K

Statuses

4K

A Smooth Guy; Developer of SmoothScroll for macOS, Windows & Google Chrome.

Joined April 2010

Don't wanna be here? Send us removal request.

Blaze (Balázs Galambosi)

@gblazex

1 year

Looking further into LLM benchmark x-correlations: - Top row: how each benchmark relates to human judgement (Arena Elo) - Other rows: any benchmark pair & their relationship - On the right: samples = # of models tested for each benchmark thx: @chipro @maximelabonne @ldjconfirmed

Andrej Karpathy

@karpathy

1 year

@AlphaSignalAI @ClementDelangue I pretty much only trust two LLM evals right now: Chatbot Arena and r/LocalLlama comments section

12

48

272

Blaze (Balázs Galambosi)

@gblazex

17 days

RT @adonis_singh: lmao what

0

142

0

Blaze (Balázs Galambosi)

@gblazex

2 months

@kimmonismus The UI is apex legends not COD

0

4

Blaze (Balázs Galambosi)

@gblazex

2 months

RT @lmarena_ai: WebDev Arena Leaderboard is now live with 10K+ votes! #1. Claude 3.5 Sonnet #2. Gemini-Exp-1206 #3. Gemini-2.0-Flash #4. G…

0

103

0

Blaze (Balázs Galambosi)

@gblazex

2 months

RT @JustinLin610: Qwen2.5-Coder is more popular than LLM? Amazing

0

6

0

Blaze (Balázs Galambosi)

@gblazex

2 months

New Arena for audio models

Diyi Yang

@Diyi_Yang

2 months

People like to talk as it's easy and natural. Now that there are Large *Audio* Models 🔊, which model do users like the most? Introducing Talk Arena🎤: an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks. Jump on Talk Arena to compare 🗳️ speech AI🧵 (1/5)

0

1

Blaze (Balázs Galambosi)

@gblazex

2 months

It's a matter of strategy what they prefer to focus on, like Character or Replika has no problem dominating your need for intimate connection. I'd love there to be alternatives. I hope WaveForms can deliver on that promise. Tech won't be an issue.

0

Blaze (Balázs Galambosi)

@gblazex

2 months

The icons are fine. The naming… we all know. But the ordering! 4o Mini is separated from non-mini. It is not ordered by smarts. Seems completely random.

Yuchen Jin

@Yuchenj_UW

2 months

It’s so funny that there is a designer at OpenAI who makes $350K - $20M TC that designed this, got approved, and shipped it:

0

1

Blaze (Balázs Galambosi)

@gblazex

2 months

@brain2_0 @Teknium1 I second this

0

2

Blaze (Balázs Galambosi)

@gblazex

2 months

@nabeelqu Interesting! Wittgenstein seemed to have a lot of insights that are relevant to AI. Do you have any other references?

0

Blaze (Balázs Galambosi)

@gblazex

2 months

@yanndubs @OpenAI way to go Yann!

0

1

Blaze (Balázs Galambosi)

@gblazex

2 months

RT @agromanou: 🚀 Introducing INCLUDE 🌍: A multilingual LLM evaluation benchmark spanning 44 languages! Contains *newly-collected* data, pri…

0

60

0

Blaze (Balázs Galambosi)

@gblazex

2 months

RT @AIWarper: TENCENT casually dropping a 13bn parameter open sourced video model WITH weights 🩵 Looks unreal and can't wait to test it ht…

0

201

0

Blaze (Balázs Galambosi)

@gblazex

3 months

banger

harry law (hopfield network truther)

@lawhsw

3 months

in Germany you have to read out the model weights every time you use an LLM

0

Blaze (Balázs Galambosi)

@gblazex

3 months

@nabeelqu followed

0

Blaze (Balázs Galambosi)

@gblazex

3 months

Why is Google trends graph so different than this?

swyx 🔜 @aidotEngineer NYC

@swyx

3 months

it cannot be underestimated the extent to which Claude 3.5 Sonnet alone is solely responsible for this 2xing of @AnthropicAI market share absolute home run model

1

0

1

Blaze (Balázs Galambosi)

@gblazex

3 months

@karpathy Ask 10 experts is a huge win. Even better is ask 10 experts to rate 10 alternatives and pick the best one based on key dimensions. Path not traveled in your search for close to optimal solution. It does have all human knowledge but in a shallow sense. Go not farther but faster.

0

1

Blaze (Balázs Galambosi)

@gblazex

3 months

@Teknium1 The amount is staggering. Doesn’t it hurt a lot that it’s all short form content? The cohesion must be very low

1

0

1

Blaze (Balázs Galambosi)

@gblazex

3 months

RT @ilanbigio: turns out you can use <xml/> tags with the realtime api to control tone with _super_ high granularity 🤷🏽‍♂️ @openai devday…

0

35

0

Blaze (Balázs Galambosi)

@gblazex

3 months

Hard real-life coding evals!

lmarena.ai (formerly lmsys.org)

@lmarena_ai

3 months

Who's the best AI software engineer? Introducing RepoChat Arena: the live AI software engineering battle!🔥🤖 1. Input any public Github link (repo/issue/PR). 2. Ask the models to fix issues, add features or chat with a repo. 3. Vote for the better one and shape the leaderboard! Watch AI solve your real-world coding tasks live at RepoChat! Exciting use cases in the thread below🧵

0

2