![Andreas Köpf Profile](https://pbs.twimg.com/profile_images/1399645992702644224/A4d8Ivar_x96.jpg)
Andreas Köpf
@neurosp1ke
Followers
7K
Following
9K
Statuses
2K
Exploring ways to algorithmically model our world.
Münster, NRW, Germany
Joined December 2012
Some people don’t know that @SchmidhuberAI in the 90s already analyzed what we all will be working on in 6-12 months: Artificial Curiosity 😉
0
3
10
RT @Lei_Wang_1999: Excited to release tilelang v0.1.0, another pythonic dsl for writing AI kernels with optional layout/pipeline annotation…
0
14
0
RT @winglian: What's the trick? DoRA. I don't have a great hypothesis on why it works yet, but I've upstreamed the changes to TRL. The PR m…
0
29
0
RT @zafstojano: I've already contributed a couple of interesting environments that stress-test 2D spatial reasoning (NQueens, BFS), tokeniz…
0
2
0
RT @jacobaustin132: Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems vie…
0
364
0
@_clashluke Would be interesting to see if l2 for classification improves adversarial robustness.
0
0
1
Simple „learnable temperature“ per head. In models with SDPA one could add it afterwards with a bit of long-context fine-tuning. More complicated versions, e.g. - others thought too trival for paper, e.g.
Scalable-Softmax Is Superior for Attention - Proposes SSMax to process longer context length more effectively - Significantly improves perf in long contexts and key information retrieval
0
6
36
RT @teortaxesTex: ≈MCTS will make a comeback on the next spin. It's simple, it's provably optimal, it's bitter-pilled. @huajian_xin wasn't…
0
59
0