Evgenii Zheltonozhskii @evgeniyzhe profile

Evgenii Zheltonozhskii

@evgeniyzhe

Followers

529

Following

37K

Statuses

1K

Physics Ph.D. student (topological condensed matter) at @TechnionLive. MSc in computer science (self-supervised learning) @[email protected]

Joined April 2016

Don't wanna be here? Send us removal request.

Evgenii Zheltonozhskii

@evgeniyzhe

9 months

In our new work on ν=5/2 FQH, we find daughter states of all candidate 5/2 states, show that a pair (but not one) of daughters uniquely identifies the parent, and interpret them in terms of composite fermions. with Ady Stern, Netanel Lindner. Details in 🧵

2

0

5

Evgenii Zheltonozhskii

@evgeniyzhe

5 hours

@alexwei_ To be clear I think this is very impressive and performance is still very high, almost surely red and you did very good job at evaluating. But skimming through the hardest problems it solved it looks like I could solve ~half of them and I was low orange at best times.

0

1

Evgenii Zheltonozhskii

@evgeniyzhe

2 days

@chessapigbay I love you square

0

Evgenii Zheltonozhskii

@evgeniyzhe

4 days

@thgisorp @HKydlicek Nope, pass@1 at N samples is just number of correct solutions divided by number of attempts

0

Evgenii Zheltonozhskii

@evgeniyzhe

6 days

@kalomaze stop_grad(round(x)-x)+x

0

Evgenii Zheltonozhskii

@evgeniyzhe

6 days

@giffmana @DimitrisPapail What I mean is there is always some chance to miss existing problem/not find it. In other words, the contamination can be both intentional and unintentional at this level. I don't think they allow intentional contamination at IMO level

1

0

2

Evgenii Zheltonozhskii

@evgeniyzhe

6 days

@mbalunovic Is it o3-mini (high) or o3-mini (medium)?

1

0

1

Evgenii Zheltonozhskii

@evgeniyzhe

6 days

@giffmana @DimitrisPapail For high level olympiads yes (more or less), but you can't really know that nobody came up with the same question before (you check and all but still may miss something). The criteria are less strict for lower level olympiads

1

0

5

Evgenii Zheltonozhskii

@evgeniyzhe

6 days

@LiJonassen Looks interesting! But you got to eval on closed sota (at least o1, o3-mini, Gemini 2) for benchmark to be relevant

1

0

Evgenii Zheltonozhskii

@evgeniyzhe

7 days

RT @JonathanRuhman: אסא היה מגדולי המדענים בארץ. פיזיקאי תיאורטי. לפני שעבר לטכניון שימש כחבר סגל Boston university. הביא לארץ את התחום שנק…

0

8

0

Evgenii Zheltonozhskii

@evgeniyzhe

9 days

@MarkNeumannnn These two are different questions as far as I can tell

0

1

Evgenii Zheltonozhskii

@evgeniyzhe

9 days

@MarkNeumannnn But each observation has 30 different options, while each page has only 5 states. I think this one is ok even though it could be written better

0

Evgenii Zheltonozhskii

@evgeniyzhe

10 days

@DanHendrycks Can you share the distribution of accuracy by category for models you tested? Even better -- the full completions

0

1

Evgenii Zheltonozhskii

@evgeniyzhe

10 days

@quanshr2024 Do you plan to update leaderboard with new models/new contests?

0

Evgenii Zheltonozhskii

@evgeniyzhe

11 days

@y_m_asano @_rohitgirdhar_ But for example mammals can't see if they weren't exposed to visual signals during critical period early in life, even if given eyes later, as shown by Wiesel & Hubel

1

0

2

Evgenii Zheltonozhskii

@evgeniyzhe

11 days

@gtsoukal @FabianGloeckle Do you plan to test o3-mini?

1

0

Evgenii Zheltonozhskii

@evgeniyzhe

12 days

@giffmana @roydanroy , then it should all fit perfectly, as long as "everything" is allowed to be hyphenated. No biggie

1

0

1

Evgenii Zheltonozhskii

@evgeniyzhe

13 days

@PhtonHton @ArthurD3791 @phill__1 @kalomaze Training compute optimally is getting highest performance possible given fixed compute. Chinchilla makes perfect sense and is correct, up to trivial extension for inference compute. It's not papers fault people are misinterpreting its message

0

Evgenii Zheltonozhskii

@evgeniyzhe

15 days

@PaglieriDavide @OpenAI @GoogleDeepMind @grok Maybe you should ask for credits and run the eval by yourself. Companies are more likely to give you compute time for free than spend their employers' time.

0

1