gblazex Profile Banner
Blaze (Balázs Galambosi) Profile
Blaze (Balázs Galambosi)

@gblazex

Followers
1K
Following
4K
Statuses
4K

A Smooth Guy; Developer of SmoothScroll for macOS, Windows & Google Chrome.

Joined April 2010
Don't wanna be here? Send us removal request.
@gblazex
Blaze (Balázs Galambosi)
1 year
Looking further into LLM benchmark x-correlations: - Top row: how each benchmark relates to human judgement (Arena Elo) - Other rows: any benchmark pair & their relationship - On the right: samples = # of models tested for each benchmark thx: @chipro @maximelabonne @ldjconfirmed
Tweet media one
@karpathy
Andrej Karpathy
1 year
@AlphaSignalAI @ClementDelangue I pretty much only trust two LLM evals right now: Chatbot Arena and r/LocalLlama comments section
12
48
272
@gblazex
Blaze (Balázs Galambosi)
17 days
RT @adonis_singh: lmao what
Tweet media one
0
142
0
@gblazex
Blaze (Balázs Galambosi)
2 months
@kimmonismus The UI is apex legends not COD
0
0
4
@gblazex
Blaze (Balázs Galambosi)
2 months
RT @lmarena_ai: WebDev Arena Leaderboard is now live with 10K+ votes! #1. Claude 3.5 Sonnet #2. Gemini-Exp-1206 #3. Gemini-2.0-Flash #4. G…
0
103
0
@gblazex
Blaze (Balázs Galambosi)
2 months
RT @JustinLin610: Qwen2.5-Coder is more popular than LLM? Amazing
0
6
0
@gblazex
Blaze (Balázs Galambosi)
2 months
New Arena for audio models
@Diyi_Yang
Diyi Yang
2 months
People like to talk as it's easy and natural. Now that there are Large *Audio* Models 🔊, which model do users like the most? Introducing Talk Arena🎤: an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks. Jump on Talk Arena to compare 🗳️ speech AI🧵 (1/5)
Tweet media one
0
0
1
@gblazex
Blaze (Balázs Galambosi)
2 months
It's a matter of strategy what they prefer to focus on, like Character or Replika has no problem dominating your need for intimate connection. I'd love there to be alternatives. I hope WaveForms can deliver on that promise. Tech won't be an issue.
0
0
0
@gblazex
Blaze (Balázs Galambosi)
2 months
The icons are fine. The naming… we all know. But the ordering! 4o Mini is separated from non-mini. It is not ordered by smarts. Seems completely random.
@Yuchenj_UW
Yuchen Jin
2 months
It’s so funny that there is a designer at OpenAI who makes $350K - $20M TC that designed this, got approved, and shipped it:
Tweet media one
0
0
1
@gblazex
Blaze (Balázs Galambosi)
2 months
@brain2_0 @Teknium1 I second this
0
0
2
@gblazex
Blaze (Balázs Galambosi)
2 months
@nabeelqu Interesting! Wittgenstein seemed to have a lot of insights that are relevant to AI. Do you have any other references?
0
0
0
@gblazex
Blaze (Balázs Galambosi)
2 months
@yanndubs @OpenAI way to go Yann!
0
0
1
@gblazex
Blaze (Balázs Galambosi)
2 months
RT @agromanou: 🚀 Introducing INCLUDE 🌍: A multilingual LLM evaluation benchmark spanning 44 languages! Contains *newly-collected* data, pri…
0
60
0
@gblazex
Blaze (Balázs Galambosi)
2 months
RT @AIWarper: TENCENT casually dropping a 13bn parameter open sourced video model WITH weights 🩵 Looks unreal and can't wait to test it ht…
0
201
0
@gblazex
Blaze (Balázs Galambosi)
3 months
banger
@lawhsw
harry law (hopfield network truther)
3 months
in Germany you have to read out the model weights every time you use an LLM
0
0
0
@gblazex
Blaze (Balázs Galambosi)
3 months
@nabeelqu followed
0
0
0
@gblazex
Blaze (Balázs Galambosi)
3 months
Why is Google trends graph so different than this?
@swyx
swyx 🔜 @aidotEngineer NYC
3 months
it cannot be underestimated the extent to which Claude 3.5 Sonnet alone is solely responsible for this 2xing of @AnthropicAI market share absolute home run model
Tweet media one
1
0
1
@gblazex
Blaze (Balázs Galambosi)
3 months
@karpathy Ask 10 experts is a huge win. Even better is ask 10 experts to rate 10 alternatives and pick the best one based on key dimensions. Path not traveled in your search for close to optimal solution. It does have all human knowledge but in a shallow sense. Go not farther but faster.
0
0
1
@gblazex
Blaze (Balázs Galambosi)
3 months
@Teknium1 The amount is staggering. Doesn’t it hurt a lot that it’s all short form content? The cohesion must be very low
1
0
1
@gblazex
Blaze (Balázs Galambosi)
3 months
RT @ilanbigio: turns out you can use <xml/> tags with the realtime api to control tone with _super_ high granularity 🤷🏽‍♂️ @openai devday…
0
35
0
@gblazex
Blaze (Balázs Galambosi)
3 months
Hard real-life coding evals!
@lmarena_ai
lmarena.ai (formerly lmsys.org)
3 months
Who's the best AI software engineer? Introducing RepoChat Arena: the live AI software engineering battle!🔥🤖 1. Input any public Github link (repo/issue/PR). 2. Ask the models to fix issues, add features or chat with a repo. 3. Vote for the better one and shape the leaderboard! Watch AI solve your real-world coding tasks live at RepoChat! Exciting use cases in the thread below🧵
0
0
2