![Peter Beierle π Profile](https://pbs.twimg.com/profile_images/1534541046549454848/eX2Mwmsk_x96.jpg)
Peter Beierle π
@DoubtingPeter
Followers
22
Following
2K
Statuses
309
This is an Opinion-based Twitter Account with not too many opinions
None of Your Business
Joined December 2009
Lots of hot takes on whether it's possible that DeepSeek made training 45x more efficient, but @doodlestein wrote a very clear explanation of how they did it. Once someone breaks it down, it's not hard to understand. Rough summary: * Use 8 bit instead of 32 bit floating point numbers, which gives massive memory savings * Compress the key-value indices which eat up much of the VRAM; they get 93% compression ratios * Do multi-token prediction instead of single-token prediction which effectively doubles inference speed * Mixture of Experts model decomposes a big model into small models that can run on consumer-grade GPUs
0
0
0
Lots of hot takes on whether it's possible that DeepSeek made training 45x more efficient, but @doodlestein wrote a very clear explanation of how they did it. Once someone breaks it down, it's not hard to understand. Rough summary: * Use 8 bit instead of 32 bit floating point numbers, which gives massive memory savings * Compress the key-value indices which eat up much of the VRAM; they get 93% compression ratios * Do multi-token prediction instead of single-token prediction which effectively doubles inference speed * Mixture of Experts model decomposes a big model into small models that can run on consumer-grade GPUs
0
0
1
RT @anishgiri: So, Nepo-Dubov knight dance was not a good enough joke? Hm. FIDE, if you set up the Sense of humor commission, I am readyβ¦
0
113
0
RT @agadmator: Magnus should just tweet at the end of every year who the classical, rapid, blitz and Fischer-Random World champions are (orβ¦
0
433
0
@Ross_G_Menzies @HansMokeNiemann Fide has the opportunity to resolve this with Armageddon and failed to. Regardless, there is a strong case that New York Penal Law 180.50 was violated here where both Ian and Magnus seem to confer substantial benefits from this match fixing.
1
0
14
RT @DoubtingPeter: @MattWalshBlog re: your argument concerning consciousness "how can a whole brain have consciousness if individual brainβ¦
0
1
0
@MattWalshBlog re: your argument concerning consciousness "how can a whole brain have consciousness if individual brain cells do not" is ignoring the concept of emergence in nature. A single molecule of water isn't wet, nor does a copper atom conduct electricity.
0
1
0
@chesscom If Rapport finds it worth his time, I suspect that he can command a very high price to be a second for others in the future.
0
0
0
@RenegadeUniv @elonmusk Interesting idea, but not clear how to implement. If the NGO's were funded based on the change [reduction] of homelessness, that may incentivise a steady influx of homelessness. Or, being funded by a lack of homelessness would lead to NGO's being paid to do nothing long-term.
0
0
1
@MaxDerakhshani 2. The conventional wisdom that mandatory spending and military cuts are politically untenable and 3. That such reforms, particularly social security and medicare, are not going to be realized until the boomer generation is gone.
0
0
0
@ChessMike @chesscom @rjrapport Fantastic! Fingers crossed that he can again bring the magic to Team Ding!
0
0
17
@michaelmalice Hey Michael, do you think now is a good time to reach out to the left about the idea of a national divorce, or would it be a wasted effort.
0
0
0