Daniel Duan
@daniel_duan
Followers
3K
Following
6K
Statuses
28K
Tyranny is the deliberate removal of nuance. SwiftUI @
Joined April 2008
RT @tekknolagi: Write parsers. Not too many. Mostly recursive descent. -- Michael Scott-Pollan (As seen somewhere else but I can't find i…
0
1
0
Dunning-Kruger for RL
Offline reinforcement learning, where an agent tries to improve a behavior policy by observing another agent without actually playing, is a harder problem than it appears. The challenge isn’t to mimic the provided play, but to learn something better than what you have seen. The difference between online (traditional) RL and offline RL is that online RL is constantly "testing" its model by taking new actions as a result of changes to the model, while the offline training can bootstrap itself off into a coherent fantasy of great returns untested by reality. It may be just an artifact of value based RL in particular, but I am inclined to believe that it is a more fundamental truth about theoretical and observational science versus experimental science, and life in general.
0
0
0
RT @liuliu: See why Cerebras and Groq only support distilled version not the MoE version? If they still cannot put the MoE version out with…
0
1
0
RT @iScienceLuvr: Anyone who thinks DeepSeek just came out of nowhere should see this graph. For each model on this graph, weights, code,…
0
690
0
RT @PalmerLuckey: DeepSeek is legitimately impressive, but the level of hysteria is an indictment of so many. The $5M number is bogus. It…
0
4K
0
It’s always funny when ppl equate ppl doing a thing with “China did a thing”.
I think the Deepseek moment is not really the Sputnik moment, but more like the Google moment. If anyone was around in ~2004, you'll know what I mean, but more on that later. I think everyone is over-rotated on this because Deepseek came out of China. Let me try to un-rotate you. Deepseek could have come out of some lab in the US Midwest. Like say some CS lab couldn't afford the latest nVidia chips and had to use older hardware, but they had a great algo and systems department, and they found a bunch of optimizations and trained a model for a few million dollars and lo, the model is roughly on par with o1. Look everyone, we found a new training method and we optimized a bunch of algorithms! Everyone is like OH WOW and starts trying the same thing. Great week for AI advancement! No need for US markets to lose a trillion in market cap. The tech world (and apparently Wall Street) is massively over-rotated on this because it came out of CHINA. I get it. After everyone has been sensitized over the H1BLM uproar, we are conditioned to think of OMG Immigrants China as some kind of Alien Other. As though the Alien-Other Chinese Researchers are doing something special that's out of reach and now China The Empire is somehow uniquely in possession of Super Efficient AI Power and the US companies can't compete. The subtext of "A New Fearsome Power Now Under The Command of the CCP" is what's driving the current sentiment, and it's not really valid. Like, no. These are guys basically working on the same problems we are in the US, and not only that, they wrote a paper about it and open-sourced their model! It is not actually some sort of tectonic geopolitical shift, it is just Some Nerds Over There saying "Hey we figured out some cool shit, here's how we did it, maybe you would like to check it out?" Sputnik showed that the Soviets could do something the US couldn't ("a new fearsome power"). They didn't subsequently publish all the technical details and half the blueprints. They only showed that it could be done. With Deepseek, if I recall correctly, a lab in Berkeley read their paper and duplicated the claimed results on a small scale within a day. That's why I say it's like the Google moment in 2004. Google filed its S-1 in 2004, and revealed to the world that they had built the largest supercomputer cluster by using distributed algorithms to network together commodity computers at the best performance-per-dollar point on the cost curve. This was in contrast to every other tech company, who at that time just bought what were essentially larger and larger mainframes, always at the most expensive leading edge of the cost curve. (To the young people reading this, this will sound incredible to you) I worked at PayPal at the time, and in order to keep pace with the rising transaction volume, the company was forced to buy bigger and bigger database servers from Oracle. We were totally Oracle's bitch. At one point when we ran into scalability issues, the Oracle reps told us we were their biggest installation so they had no other reference point on how to help us overcome our scalability issues. We literally resorted to flipping random config switches and rebooting it. (This heavily influenced me when I was a young manager later at Facebook. I deliberately torpedoed an Oracle salesman's pitch to try and get us to switch from open source MySQL databases to an Oracle contract: of course we had scalability problems, but at least when we had them, we could open up the hood and figure out how to fix it ... assuming we had good enough engineers, and we did. When it's closed-source infra, you're at the mercy of the vendor's support engineers) Back to Google - in their S-1, they described how they were able to leapfrog the scalability limits of mainframes and had been (for years!) running a far more massive networked supercomputer comprised of thousands of commodity machines at the optimal performance-per-dollar price point - i.e. not the more expensive leading edge - all knit together by fault-tolerant distributed algorithms written in-house. Some time later, Google published their MapReduce and BigTable papers, describing the algorithms they'd used to manage and control this massively more cost-effective and powerful supercomputer. Deepseek is MUCH more like the Google moment, because Google essentially described what it did and told everyone else how they could do it too. In Google's case, a fair bit of time elapsed between when they revealed to the world what they were doing and when they published a papers showing everyone how to do it. Deepseek, in contrast, published their paper alongside the model release. Now, I've also written about how I think this is also a demonstration of Deepseek's trajectory, but that's also no different from Google in ~2004 revealing what it was capable of. Competitors will still need to gear up and DO the thing, but they've moved the field forward. But it's not like Sputnik where the Soviets have developed technology unreachable to the US, it's more like Google saying, "Hey, we did this cool thing, here's how we did it." There is no reason to think nVidia and OAI and Meta and Microsoft and Google et al are dead. Sure, Deepseek is a new and formidable upstart, but doesn't that happen every week in the world of AI? I am sure that Sam and Zuck, backed by the power of Satya, can figure something out. Everyone is going to duplicate this feat in a few months and everything just got cheaper. The only real consequence is that AI utopia/doom is now closer than ever. ==== Bonus: This is also a little similar the Ethereum PoS moment, when AI finally has a counterpoint to the environmentalists who say AI uses so much electricity. We just brought down the cost of inference by 97%!
0
0
0
RT @bradtgmurray: Today Google open sourced PebbleOS and it makes me incredibly happy. That codebase and that team I still have so much pri…
0
415
0
Is this how AI is going to take our jerbs?
Not long ago, I used to have a more optimistic impression of Rust users. I would not have guessed that so many otherwise-judicious people would go for blatantly AI-"maintained" Rust libraries. The `serde_yml` crate is a fork of a high-quality but unmaintained library. In the fork, the AI has taken initiative to add a big heap of stuff that is variously complete nonsense ( or unsound (. On top of this, the crate's documentation has been broken in docs·rs for the last 5 months because AI hallucinated a nonexistent rustdoc flag into the crate's configuration. And yet 134 other published packages have chosen to adopt this? Including high-profile competently maintained projects like Jiff (for tests only), axodotdev, Wasmer, MiniJinja, and Holochain. This does not bode well. The bar for someone to do better at a YAML library is so low.
0
0
2
RT @davidtolnay: Not long ago, I used to have a more optimistic impression of Rust users. I would not have guessed that so many otherwise-j…
0
131
0
RT @headinthebox: I am watching the DeepSeek R1 circus with much amusement. It is not even funny how obvious it is that smarter software…
0
55
0