willccbb Profile Banner
will brown Profile
will brown

@willccbb

Followers
12K
Following
18K
Statuses
3K

ai research @morganstanley | prev phd @columbia bs/ms @penn

nyc
Joined February 2015
Don't wanna be here? Send us removal request.
@willccbb
will brown
10 days
environment engineering is the new rubric engineering
@willccbb
will brown
18 days
rubric engineering is the new prompt engineering
2
1
38
@willccbb
will brown
7 hours
@teortaxesTex @PrimeIntellect @georgejrjrjr not all verifiable but decent starting point
0
0
7
@willccbb
will brown
8 hours
@teortaxesTex my interpretation of the technique in this paper is that you basically want more momentum and/or higher effective batch size a gradient only holds so many bits, increasing the # examples it needs to learn about focuses those bits towards generalization
0
0
5
@willccbb
will brown
10 hours
@boazbaraktcs @jeremyphoward the exact opposite is true -- inference-time compute is provably sufficient to solve all problems solvable by any circuit, with steps scaling linearly in circuit size, given constant depth + log(size) embedding dim
2
0
2
@willccbb
will brown
12 hours
@signalgaining the point isn't about using models to do rote calculations, it's about a more general paradigm of learning to solve increasingly hard problems without needing tons of solution data
1
0
3
@willccbb
will brown
12 hours
@wordgrammer it would be pretty unusual if the CEO of a trillion dollar company was 20 years old
2
0
28
@willccbb
will brown
17 hours
@Joanvelja that's a good point, if 10x-ing (?) RL lets you 0.1x test-time compute while still improving accuracy in general, that'd be awesome
1
0
3
@willccbb
will brown
17 hours
RT @leonardtang_: i've been entirely consumed these past few weeks by the LLM-as-a-judge research agenda. there's lots of great work, but…
0
17
0
@willccbb
will brown
17 hours
@Joanvelja they're still sampling 1K solutions for IOI and submitting 50 how good is o3 if you sample + submit 1 solution?
Tweet media one
2
0
3
@willccbb
will brown
20 hours
@StateSpeed_AB @casper_hansen_ different algorithms that do different things. GRPO is training on many distinct rollouts per prompt + computing relative advantages, sample count isn’t apples-apples for “training tokens”
1
0
0
@willccbb
will brown
21 hours
@DavidFSWD oh just the latter haha (single node)
0
0
0
@willccbb
will brown
1 day
@TheXeophon @iScienceLuvr code execution
Tweet media one
1
0
1
@willccbb
will brown
1 day
@TheXeophon i can only dream
0
0
1
@willccbb
will brown
1 day
the “wow that’s crazy” benchmarket is impossible to saturate until progress truly plateaus. it’s where you get a model to do something that makes people go “wow that’s crazy”
0
0
20
@willccbb
will brown
1 day
@TheXeophon they're a lot of fun to read sometimes
0
0
6
@willccbb
will brown
1 day
@jchencxh @rogerw0108 vLLM as in @vllm_project not Vision-Language Models
1
0
2
@willccbb
will brown
1 day
1
0
1