![Andrew Dickson Profile](https://pbs.twimg.com/profile_images/1670144737557217280/gcMyRT56_x96.png)
Andrew Dickson
@xordrew
Followers
3
Following
11
Statuses
19
Joined June 2023
@katie_kang_ @setlur_amrith @its_dibya @JacobSteinhardt @svlevine @aviral_kumar2 It's really interesting that memorization doesn't displace existing generalization. But I guess the converse, where a model learns a generalizable solution after memorizing, doesn't really happen? Any thoughts on why?
1
0
0
@nickcdryan @jxmnop Do you know if this was ever updated for 2024 era models? I wish I could see something like this for mamba, hyena, xlstm, even RAG if you stretch the premise a little.
0
0
0
@RichmanRonald @francoisfleuret I wonder how bad that actually is. There's the standard intuition built around attention being a soft information router, but you can also just call it a reasonable way to factorize a giant matrix.
0
0
1
@francoisfleuret A nice bit of ML lore is that in the original AlexNet, splitting the network to fit across GPUs caused one half to learn texture patterns and the other to learn color patterns. Apparently the effect was pretty consistent.
1
1
3
@owl_poster I kind of want to try more comparisons here, maybe even just with AlphaFold2's internal embeddings. Isn't pLDDT being high for both disordered and fake proteins completely correct behavior?
0
0
2
@NikoMcCarty In the essay on low Reynolds numbers, there's a really interesting note on how almost all liquids have viscosity higher than water's. Did anyone get around to explaining that in the last 50 years?
0
0
1
@andrewwhite01 This makes a lot more sense if you know that lean mass hyper responders are a group that gains cholesterol on a ketogenic diet. Oreos are pretty darn un-ketogenic.
0
0
0
@evanjconrad I've been working on tuning protein language models for iterative design, it's an academic project, but a pretty interesting one.
0
0
0
@francoisfleuret This was the key equation for ballpark estimates. Bottom line is that if you try to improve things like clock rate, power use, etc, error rates increase exponentially. On the upside, acceptable error rates were listed as like 1/computer/year, so lots of OOM to work with.
0
0
2
@francoisfleuret That said, that's if you're keeping normal architectures and just tolerating some low importance bit flips. I'd imagine you can get a lot more creative.
0
0
0
@TimothyDuignan I love this area, and the whole general idea of noisy computing. Random example I liked was oscillator computing: where you just let coupled oscillators converge on a minimum energy ensemble.
1
0
3
@owl_poster Have you seen any good overviews of the types of PPI databases? It's always been embarrassingly unclear to me what interactions specifically mean, and how much variance/false positives you'd get from different experimental methods.
0
0
0
@DmitryRybin1 @du_yilun I've never heard that, could I literally solve sudoku problems with a convex solver?
2
0
1
@TimothyDuignan Awesome article, thanks for sharing! It's funny how much more relevant this feels post AF-3. At this point I'm definitely wondering what point we'll settle on on the continuum between actual MD and direct ML prediction for the harder problems in molecule design.
1
1
1
@TimothyDuignan It always surprises me that diffusion models and alphafold are in some sense 'SOTA' models for NNPs, since they're purely generative models with no real physics baked into training. Have you seen anyone integrate them with MD in a convincing way?
1
1
1