🎱 DataPeDD
@DataPeDD
Followers
650
Following
7K
Statuses
3K
GME x BBBY x CYDY to Uranus DD for ML, Retail, Biotech Tweets, Likes or Reweets are only personal opinions, not financial advice nor am I a financial advisor.
Joined April 2023
wow @VictorTaelin
This is wild - UC Berkeley shows that a tiny 1.5B model beats o1-preview on math by RL! They applied simple RL to Deepseek-R1-Distilled-Qwen-1.5B on 40K math problems, trained at 8K context, then scaled to 16K & 24K. 3,800 A100 hours ($4,500) to beat o1-preview in math! Best thing is they open-sourced everything: the model, the training code (based on ByteDance verl library), and the dataset.
0
0
2
Just imagine Pulte is sworn in congress and the whole world learns due to some congress person democrat about the PPShow 😂 @ThePPseedsShow
2
0
23
@VictorTaelin Bisimulation Types (Labeled Markov Processes) + RL + Factor Graphs are so awesome. Genious Professor "Prakash Panangaden": All you need is a metric (measurement spaces):
0
0
0
RT @Hesamation: 7GB VRAM is all you need to train your own reasoning model. Unsloth made some great points: > GRPO is now optimized to use…
0
316
0
RT @HumbleandH: @tZERO @BeyondBYON @AlderLaneEggs @fleckcap #tzero #tzrop #Tzero_Bros #byon #gme News!!!!
0
4
0
@HarryBoby4 Look into your teeth and stellate ganglion block
Full treatment plan for #LongCovid, #POTS, #MECFS , ebv, monocytes, bone marrow, auto-antibodies (GPCR): (no medical advice, always consult your doctor before taking any medication or therapy) that is based on the latest research in Long-Covid and ME-CFS: 👇
0
0
3
@MichaelArnaldi What about it is from the developer of TypeRunner and uses Runtime types with a typescript transformer @deepkit/type-compile
1
0
0
Physics laws reveal how Next-token Prediction (NTP) models actually learn and why they need so much energy. Information conservation explains why bigger models need more training data 🎯 Original Problem: Current auto-regressive models using Next-token Prediction (NTP) require massive datasets and computational power, but we lack understanding of why this leads to intelligence emergence. We need to uncover the fundamental physics behind NTP to optimize model training. ----- 🔧 Solution in this Paper: → Introduced First Law of Information Capacity (IC-1): ηN = D(H-L), showing intelligence emerges through information transfer from dataset to model parameters → Proposed Second Law of Information Capacity (IC-2): E0 = ηN(kBT ln 2), establishing minimum energy requirements for training → Demonstrated model training is essentially compressing dataset information, with information capacity η indicating compression efficiency ----- 💡 Key Insights: → Model training follows information conservation law - no information is lost, only transferred → Dataset entropy can be estimated using initial model loss → Information capacity (η) typically falls between 0.115 and 0.268 for current models → Found direct proportional relationship between model size (N) and training tokens (D) ----- 📊 Results: → Validated theoretical framework against OpenAI's Scaling Laws → Proved compatibility with Knowledge Capacity Scaling Laws → Demonstrated universal applicability across all auto-regressive architectures
0
0
0
@VictorTaelin Interesting person. Very good at compilers/type-checkers (first byte-code typescript type checker)
anyone looking for a skilled developer? (C++, TypeScript, Python, Machine Learning). I'm looking for new challenges
0
0
1
What kind of way will Bend use to check types @VictorTaelin ? Will it use some byte-code representation (and be 10000x faster than typescript)?
yes, of course there is always an alternative. That's called a byte-code VM, instead of walking the AST while computing types we convert the AST into a byte-code representation and we feed that into a VM, we could even make `.d.ts` being directly byte-code so parsing is optimal
0
0
0