will brown @willccbb profile

will brown

@willccbb

Followers

12K

Following

18K

Statuses

3K

ai research @morganstanley | prev phd @columbia bs/ms @penn

nyc

Joined February 2015

Don't wanna be here? Send us removal request.

will brown

@willccbb

10 days

environment engineering is the new rubric engineering

will brown

@willccbb

18 days

rubric engineering is the new prompt engineering

2

1

38

will brown

@willccbb

7 hours

@teortaxesTex @PrimeIntellect @georgejrjrjr not all verifiable but decent starting point

0

7

will brown

@willccbb

8 hours

@teortaxesTex my interpretation of the technique in this paper is that you basically want more momentum and/or higher effective batch size a gradient only holds so many bits, increasing the # examples it needs to learn about focuses those bits towards generalization

0

5

will brown

@willccbb

10 hours

@boazbaraktcs @jeremyphoward the exact opposite is true -- inference-time compute is provably sufficient to solve all problems solvable by any circuit, with steps scaling linearly in circuit size, given constant depth + log(size) embedding dim

2

0

2

will brown

@willccbb

12 hours

@signalgaining the point isn't about using models to do rote calculations, it's about a more general paradigm of learning to solve increasingly hard problems without needing tons of solution data

1

0

3

will brown

@willccbb

12 hours

@wordgrammer it would be pretty unusual if the CEO of a trillion dollar company was 20 years old

2

0

28

will brown

@willccbb

17 hours

@Joanvelja that's a good point, if 10x-ing (?) RL lets you 0.1x test-time compute while still improving accuracy in general, that'd be awesome

1

0

3

will brown

@willccbb

17 hours

RT @leonardtang_: i've been entirely consumed these past few weeks by the LLM-as-a-judge research agenda. there's lots of great work, but…

0

17

0

will brown

@willccbb

17 hours

@Joanvelja they're still sampling 1K solutions for IOI and submitting 50 how good is o3 if you sample + submit 1 solution?

2

0

3

will brown

@willccbb

20 hours

@StateSpeed_AB @casper_hansen_ different algorithms that do different things. GRPO is training on many distinct rollouts per prompt + computing relative advantages, sample count isn’t apples-apples for “training tokens”

1

0

will brown

@willccbb

21 hours

@DavidFSWD oh just the latter haha (single node)

0

will brown

@willccbb

1 day

@TheXeophon @iScienceLuvr code execution

1

0

1

will brown

@willccbb

1 day

@TheXeophon i can only dream

0

1

will brown

@willccbb

1 day

the “wow that’s crazy” benchmarket is impossible to saturate until progress truly plateaus. it’s where you get a model to do something that makes people go “wow that’s crazy”

0

20

will brown

@willccbb

1 day

@TheXeophon they're a lot of fun to read sometimes

0

6

will brown

@willccbb

1 day

@jchencxh @rogerw0108 vLLM as in @vllm_project not Vision-Language Models

1

0

2

will brown

@willccbb

1 day

@rogerw0108 yes!

1

0

1