![Evgenii Zheltonozhskii Profile](https://pbs.twimg.com/profile_images/1475303239583244293/ER0ecUAp_x96.jpg)
Evgenii Zheltonozhskii
@evgeniyzhe
Followers
529
Following
37K
Statuses
1K
Physics Ph.D. student (topological condensed matter) at @TechnionLive. MSc in computer science (self-supervised learning) @[email protected]
Joined April 2016
@alexwei_ To be clear I think this is very impressive and performance is still very high, almost surely red and you did very good job at evaluating. But skimming through the hardest problems it solved it looks like I could solve ~half of them and I was low orange at best times.
0
0
1
@thgisorp @HKydlicek Nope, pass@1 at N samples is just number of correct solutions divided by number of attempts
0
0
0
@giffmana @DimitrisPapail What I mean is there is always some chance to miss existing problem/not find it. In other words, the contamination can be both intentional and unintentional at this level. I don't think they allow intentional contamination at IMO level
1
0
2
@giffmana @DimitrisPapail For high level olympiads yes (more or less), but you can't really know that nobody came up with the same question before (you check and all but still may miss something). The criteria are less strict for lower level olympiads
1
0
5
@LiJonassen Looks interesting! But you got to eval on closed sota (at least o1, o3-mini, Gemini 2) for benchmark to be relevant
1
0
0
RT @JonathanRuhman: אסא היה מגדולי המדענים בארץ. פיזיקאי תיאורטי. לפני שעבר לטכניון שימש כחבר סגל Boston university. הביא לארץ את התחום שנק…
0
8
0
@MarkNeumannnn But each observation has 30 different options, while each page has only 5 states. I think this one is ok even though it could be written better
0
0
0
@DanHendrycks Can you share the distribution of accuracy by category for models you tested? Even better -- the full completions
0
0
1
@y_m_asano @_rohitgirdhar_ But for example mammals can't see if they weren't exposed to visual signals during critical period early in life, even if given eyes later, as shown by Wiesel & Hubel
1
0
2
@giffmana @roydanroy , then it should all fit perfectly, as long as "everything" is allowed to be hyphenated. No biggie
1
0
1
@PhtonHton @ArthurD3791 @phill__1 @kalomaze Training compute optimally is getting highest performance possible given fixed compute. Chinchilla makes perfect sense and is correct, up to trivial extension for inference compute. It's not papers fault people are misinterpreting its message
0
0
0
@PaglieriDavide @OpenAI @GoogleDeepMind @grok Maybe you should ask for credits and run the eval by yourself. Companies are more likely to give you compute time for free than spend their employers' time.
0
0
1