![Shawn Tan Profile](https://pbs.twimg.com/profile_images/1259071387970400256/gfrCrWeb_x96.jpg)
Shawn Tan
@tanshawn
Followers
1K
Following
1K
Statuses
1K
MIT-IBM Watson AI Lab / PhD student, Mila, UdeM.
Cambridge, MA
Joined October 2009
RT @MayankMish98: Accelerating inference using Ladder-Residual - no custom kernels! - device agnostic! - code in pure PyTorch
0
3
0
RT @chinwei_h: 📜 MatterGen published by Nature 📢 A generative model capable of discovering new materials. Super excited for the team! Chec…
0
7
0
RT @riccardograzzi: I just had the pleasure of presenting "Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues" at @Autom…
0
13
0
RT @Yikang_Shen: Granite 3.1 has been released! The new Granite LLM family has 128K context, better performance, and Apache 2.0 license. ht…
0
5
0
@CFGeek @teortaxesTex It takes the previous layer (iteration) as input and produces the current layer. It doesn't use later layers as input for the next time-step. It's still 'parallel' across timesteps.
0
0
1
@CFGeek @teortaxesTex Then I don't see what difference that has with UTs, so are they all bad approaches?
1
0
2
@CFGeek @teortaxesTex Do you see RNNs as "feedback recurrence" or is that a separate thing to you?
1
0
3
@teortaxesTex @kalomaze Computational expressivity-wise I still like UTs over Transformers for many reasons. Just think the original tweet is an oversimplification. There's interplay between computation and 'knowledge' in these models it seems to me we need more 'knowledge' for the UT.
0
0
3
RT @marktenenholtz: “So actually, pre-2017, we were convinced that LSTMs were the way to go for NLP. Attention was invented as an architect…
0
66
0