IlyaAbyzov Profile Banner
Ilya Abyzov Profile
Ilya Abyzov

@IlyaAbyzov

Followers
4K
Following
5K
Statuses
3K

Ex-ex-engineer building whatever seems funniest at the moment. Co-founder @goforward. Launched uberX & led Uber SF

San Francisco
Joined April 2010
Don't wanna be here? Send us removal request.
@IlyaAbyzov
Ilya Abyzov
12 days
Inspired by @karpathy and the idea of using games to compare LLMs, I've built a version of the game Codenames where different models are paired in teams to play the game with each other. Fun to see o3-mini team with R1 against Grok and Gemini! Link and repo below.
@karpathy
Andrej Karpathy
12 days
I quite like the idea using games to evaluate LLMs against each other, instead of fixed evals. Playing against another intelligent entity self-balances and adapts difficulty, so each eval (/environment) is leveraged a lot more. There's some early attempts around. Exciting area.
56
228
3K
@IlyaAbyzov
Ilya Abyzov
2 days
Tweet media one
@elonmusk
Elon Musk
3 days
@provisionalidea This retard thinks the government uses SQL
2
0
5
@IlyaAbyzov
Ilya Abyzov
2 days
@tickerBITCOINbb @punk9059 Damn, hope they refunded your lift pass so you at least got to ski free
0
0
2
@IlyaAbyzov
Ilya Abyzov
5 days
Say what you will about the destruction of basic ethics and decency, but if we can cancel some SaaS seats and save 0.00000001% of the federal budget to buy three new rivets on a Boeing strategic bomber instead, it will have been well worth it.
@elonmusk
Elon Musk
5 days
There are tens of millions of media & software subscriptions paid by the federal government – your tax dollars – that show ZERO usage!!
1
0
14
@IlyaAbyzov
Ilya Abyzov
5 days
@gizakdag Love this one
Tweet media one
0
0
3
@IlyaAbyzov
Ilya Abyzov
5 days
Get rekt
Tweet media one
0
0
1
@IlyaAbyzov
Ilya Abyzov
6 days
Aside from being a funny Hail Mary, it shows a trade-off with thinking models: you can’t run this much inference on a massive # of params, so you ablate some useful world knowledge (like that this isn’t how Codenames works) in exchange for clever depth of thought.
0
0
0
@IlyaAbyzov
Ilya Abyzov
7 days
@polynoamial I think I'm 2/3rds of the way there with something like this, which I can extend to more types of games. How would you suggest making it a more proper eval? Persistent leaderboards a la Chatbot Arena? Use ELO or something else?
@IlyaAbyzov
Ilya Abyzov
12 days
Inspired by @karpathy and the idea of using games to compare LLMs, I've built a version of the game Codenames where different models are paired in teams to play the game with each other. Fun to see o3-mini team with R1 against Grok and Gemini! Link and repo below.
0
0
2
@IlyaAbyzov
Ilya Abyzov
7 days
OpenAI pulled a trick that 100% of people fell for: Deep Research’s score on the test is using live retrieval from the web, which obviously makes it completely not an apples-to-apples with any static model. They asterisked this in their table and everyone ignored the asterisk
@tomaspueyo
Tomas Pueyo
8 days
It's coming
Tweet media one
2
0
3
@IlyaAbyzov
Ilya Abyzov
7 days
@MarshallOsborne Sadly, it really was named UBERx at first:
Tweet media one
0
0
0
@IlyaAbyzov
Ilya Abyzov
7 days
@MarshallOsborne o3-mini-high-BLACKx
1
0
0
@IlyaAbyzov
Ilya Abyzov
9 days
@kadikraman That would be great. Was confused about best practices on how to mix tab vs modal routing in my first Expo project. When to present things from left vs bottom, how to use drawers well, how to provide expected back button behavior on screens reachable from different places, etc
0
0
3
@IlyaAbyzov
Ilya Abyzov
9 days
@iamjakestream Pretty sure Travis once said: “It’s like there’s a train coming by with bags of money on it and it’s irresponsible not to take the bags off the train since you don’t know if it’ll come around again” Definitely works unless it doesn’t.
1
0
3
@IlyaAbyzov
Ilya Abyzov
9 days
@iamjakestream Learned that one the hard way already!
1
0
2
@IlyaAbyzov
Ilya Abyzov
9 days
@iamjakestream Would have taken you up on it, but I’m an AI board game entrepreneur now. $0 in topline but a cool -$80 in EBITDA once I count the API costs. Preseed oversubscribed
@IlyaAbyzov
Ilya Abyzov
12 days
Inspired by @karpathy and the idea of using games to compare LLMs, I've built a version of the game Codenames where different models are paired in teams to play the game with each other. Fun to see o3-mini team with R1 against Grok and Gemini! Link and repo below.
1
0
1
@IlyaAbyzov
Ilya Abyzov
9 days
@rauchg They really are! Should I be using Vercel instead of CF? I really like CF workers and Vite but keeping an open mind.
@IlyaAbyzov
Ilya Abyzov
12 days
Inspired by @karpathy and the idea of using games to compare LLMs, I've built a version of the game Codenames where different models are paired in teams to play the game with each other. Fun to see o3-mini team with R1 against Grok and Gemini! Link and repo below.
0
1
6
@IlyaAbyzov
Ilya Abyzov
9 days
@ambelamps @devahaz What about special forces guy forced by circumstances to coach his kid's little league team?
1
0
3
@IlyaAbyzov
Ilya Abyzov
10 days
@oscarle_x @PawsMetax Yep, exactly.
0
0
2