tanshawn Profile Banner
Shawn Tan Profile
Shawn Tan

@tanshawn

Followers
1K
Following
1K
Statuses
1K

MIT-IBM Watson AI Lab / PhD student, Mila, UdeM.

Cambridge, MA
Joined October 2009
Don't wanna be here? Send us removal request.
@tanshawn
Shawn Tan
9 days
RT @MayankMish98: Accelerating inference using Ladder-Residual - no custom kernels! - device agnostic! - code in pure PyTorch
0
3
0
@tanshawn
Shawn Tan
10 days
@teortaxesTex Yea this is a question I'm interested in as well lol.
0
0
0
@tanshawn
Shawn Tan
10 days
@kalomaze @teortaxesTex Yea I'm thinking about fixed-depth computation here.
0
0
1
@tanshawn
Shawn Tan
14 days
@sbmaruf @agihippo Not known to the public, at least.
1
0
2
@tanshawn
Shawn Tan
16 days
RT @yoavgo: deepseek published their V3 model a month ago and that's where all the efficiency stuff was disclosed and discussed. why are pe…
0
49
0
@tanshawn
Shawn Tan
28 days
@alvations RNNs?
0
0
0
@tanshawn
Shawn Tan
28 days
RT @chinwei_h: 📜 MatterGen published by Nature 📢 A generative model capable of discovering new materials. Super excited for the team! Chec…
0
7
0
@tanshawn
Shawn Tan
30 days
@teortaxesTex Problem-"I don't know" pairs are model specific.
0
0
1
@tanshawn
Shawn Tan
1 month
RT @riccardograzzi: I just had the pleasure of presenting "Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues" at @Autom
0
13
0
@tanshawn
Shawn Tan
2 months
0
0
4
@tanshawn
Shawn Tan
2 months
RT @Yikang_Shen: Granite 3.1 has been released! The new Granite LLM family has 128K context, better performance, and Apache 2.0 license. ht…
0
5
0
@tanshawn
Shawn Tan
2 months
@CFGeek @teortaxesTex It takes the previous layer (iteration) as input and produces the current layer. It doesn't use later layers as input for the next time-step. It's still 'parallel' across timesteps.
0
0
1
@tanshawn
Shawn Tan
2 months
@CFGeek @teortaxesTex Then I don't see what difference that has with UTs, so are they all bad approaches?
1
0
2
@tanshawn
Shawn Tan
2 months
@CFGeek @teortaxesTex Do you see RNNs as "feedback recurrence" or is that a separate thing to you?
1
0
3
@tanshawn
Shawn Tan
2 months
@teortaxesTex @kalomaze I have hope for some diffusion-like tricks though.... Or DEQs?
0
0
4
@tanshawn
Shawn Tan
2 months
@teortaxesTex @kalomaze Computational expressivity-wise I still like UTs over Transformers for many reasons. Just think the original tweet is an oversimplification. There's interplay between computation and 'knowledge' in these models it seems to me we need more 'knowledge' for the UT.
0
0
3
@tanshawn
Shawn Tan
2 months
RT @marktenenholtz: “So actually, pre-2017, we were convinced that LSTMs were the way to go for NLP. Attention was invented as an architect…
0
66
0