![jakeyyy Profile](https://pbs.twimg.com/profile_images/1823571543173079040/2kOWxY2r_x96.jpg)
jakeyyy
@irohsharpeniroh
Followers
318
Following
266K
Statuses
4K
@stochasticchasm the KV cache will be built differently if you are passing latent reps. so yes given a big context the context outweighs a single token choice, but it applies to each one
1
0
3
@kitten_beloved @JaMikeyMike because bottleneck shifts to product and biz, not engineer throughout
0
0
4
@flybottlemist @gptbrooke my brother once walked in on me at 5 holding a turd up in a wad of toilet paper, studying it closely
1
0
4
@RylanSchaeffer I recently discovered an improved algorithm for LU decomposition when I realized all I need is U
0
0
2
@michaelyliu6 @dwarkesh_sp @_sholtodouglas @TrentonBricken which is not wrong, but there's a clear distinction here between what happens in a normal transformer forward pass and what latent recurrence is/does
0
0
1
@dwarkesh_sp @_sholtodouglas @TrentonBricken so the model is still only "giving" itself token reps as inputs to work with for the next token, it just gets to see its previous work at every step of the way via the cache
0
0
4