willium Profile Banner
William Wolf Profile
William Wolf

@willium

Followers
3K
Following
63K
Statuses
9K

Two LLMs in a trench coat // Now: Founder of @GestaltHDI // Then: Founder of Bayes (acq. Airtable)

New York
Joined January 2010
Don't wanna be here? Send us removal request.
@willium
William Wolf
3 hours
@danshipper Jealous
0
0
1
@willium
William Wolf
2 days
the vast majority of software is bad. like really, really bad. writing good software is fucking hard.
0
0
4
@willium
William Wolf
2 days
so cute
@uavster
Nacho Mellado
3 days
Apple gets it. Robots are going to be everywhere, but they won’t look like robots. Check out their new paper ELEGNT. I believe this is the future of everyday objects: helpful and human.
0
0
0
@willium
William Wolf
2 days
@mollycantillon so cool
0
0
2
@willium
William Wolf
3 days
@spencerrascoff Exciting!
0
0
1
@willium
William Wolf
3 days
@netspencer Why not go the other direction? Give me the answer/summary, and then let me “research” from there
0
0
0
@willium
William Wolf
3 days
@seanxthielen can’t handle the heat, stay out of kitchen etc etc
0
0
1
@willium
William Wolf
3 days
@brian_armstrong I believe this was the intention behind @Thenext50Us
0
0
1
@willium
William Wolf
4 days
@zehranaqvi_ so cool
0
0
1
@willium
William Wolf
5 days
@gasca i'm not really sure anyone knows the answer to this. the models are so vibes based.
0
0
1
@willium
William Wolf
8 days
@rob_soko apps are hard man
0
0
0
@willium
William Wolf
8 days
I should probably live in San Francisco. I’m working on a startup. Many of my friends and most of my family are there. I grew up there, and lived there for 20 years. But I’m also a (Zionist) Jew. I want to have Jewish community and marry a Jew. This is Hinge in SF.
Tweet media one
Tweet media two
1
0
6
@willium
William Wolf
9 days
just realized this is from @nelsonfliu! <3 UW CSE
0
0
2
@willium
William Wolf
11 days
@signulll @duolingo I've been thinking a lot about this too! 100% agree - this is the future. Excited to see what @oboelabs is doing here. Unfortunately the method Duolingo uses for learning languages is not very good. @speak is based on Pimsleur's method, which is superior and IMO a better basis.
1
0
2
@willium
William Wolf
12 days
0
0
0
@willium
William Wolf
12 days
this plus Jevons paradox leads me to believe that demand for compute will only go up with the news about DeepSeek. If it's cheaper to train models, I bet an order of magnitude more models are trained. The real short here is the big labs, who rely on barrier to entry as a moat.
@karpathy
Andrej Karpathy
12 days
I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed in AI. You may not always be utilizing it fully but I would never bet against compute as the upper bound for achievable intelligence in the long run. Not just for an individual final training run, but also for the entire innovation / experimentation engine that silently underlies all the algorithmic innovations. Data has historically been seen as a separate category from compute, but even data is downstream of compute to a large extent - you can spend compute to create data. Tons of it. You've heard this called synthetic data generation, but less obviously, there is a very deep connection (equivalence even) between "synthetic data generation" and "reinforcement learning". In the trial-and-error learning process in RL, the "trial" is model generating (synthetic) data, which it then learns from based on the "error" (/reward). Conversely, when you generate synthetic data and then rank or filter it in any way, your filter is straight up equivalent to a 0-1 advantage function - congrats you're doing crappy RL. Last thought. Not sure if this is obvious. There are two major types of learning, in both children and in deep learning. There is 1) imitation learning (watch and repeat, i.e. pretraining, supervised finetuning), and 2) trial-and-error learning (reinforcement learning). My favorite simple example is AlphaGo - 1) is learning by imitating expert players, 2) is reinforcement learning to win the game. Almost every single shocking result of deep learning, and the source of all *magic* is always 2. 2 is significantly significantly more powerful. 2 is what surprises you. 2 is when the paddle learns to hit the ball behind the blocks in Breakout. 2 is when AlphaGo beats even Lee Sedol. And 2 is the "aha moment" when the DeepSeek (or o1 etc.) discovers that it works well to re-evaluate your assumptions, backtrack, try something else, etc. It's the solving strategies you see this model use in its chain of thought. It's how it goes back and forth thinking to itself. These thoughts are *emergent* (!!!) and this is actually seriously incredible, impressive and new (as in publicly available and documented etc.). The model could never learn this with 1 (by imitation), because the cognition of the model and the cognition of the human labeler is different. The human would never know to correctly annotate these kinds of solving strategies and what they should even look like. They have to be discovered during reinforcement learning as empirically and statistically useful towards a final outcome. (Last last thought/reference this time for real is that RL is powerful but RLHF is not. RLHF is not RL. I have a separate rant on that in an earlier tweet
1
0
4
@willium
William Wolf
12 days
@seyitaylor Gimme(, gimme, gimme a man after midnight)
0
0
0