Charles 🎉 Frye Profile Banner
Charles 🎉 Frye Profile
Charles 🎉 Frye

@charles_irl

Followers
10,421
Following
1,858
Media
1,848
Statuses
6,727
Explore trending content on Musk Viewer
Pinned Tweet
@charles_irl
Charles 🎉 Frye
6 months
My new job is to get everyone else to see what I see in Modal: the future of data-driven computing powered by open generative models trained at the scale of the web and adapted to end-user needs by code, customization, and continual improvement. LFG.
7
3
67
@charles_irl
Charles 🎉 Frye
2 years
10/10, no notes
Tweet media one
Tweet media two
Tweet media three
Tweet media four
41
533
5K
@charles_irl
Charles 🎉 Frye
4 years
They say you die twice: the first time when your heart beats its last, the second time when the last person who knows your name dies. This indicates that life employs a garbage collector based on reference counts. In this essay, I will
34
880
4K
@charles_irl
Charles 🎉 Frye
9 months
PagedAttention, Virtual Context, Speculative Decoding, Register Tokens: the last year has seen many ideas from systems programming applied to LLMs. Not many folks live in that intersection, so I wrote an explainer post to make them a bit more accessible!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
18
285
1K
@charles_irl
Charles 🎉 Frye
1 year
I've been working on neural networks for almost a decade. The best way to describe how I'm currently feeling is like a dog that caught up with the car it was chasing, sunk its teeth in the fender, and is now traveling at 80 mph -- tail wagging, jaw tiring.
33
140
1K
@charles_irl
Charles 🎉 Frye
1 year
"any other questions before we wrap up this interview?" "yeah - are y'all an Adderall CEO, LSD CTO startup or an LSD CEO, Adderall CTO startup?"
21
88
1K
@charles_irl
Charles 🎉 Frye
2 years
did an entire PhD on optimization just to end up here for most projects
Tweet media one
13
48
1K
@charles_irl
Charles 🎉 Frye
3 years
If you're like me, you've written a lot of PyTorch code without ever being entirely sure what's _really_ happening under the hood. Over the last few weeks, I've been dissecting some training runs using @PyTorch 's trace viewer in @weights_biases . Read on to learn what I learned!
6
161
851
@charles_irl
Charles 🎉 Frye
1 month
New guide to using CUDA on @modal_labs just dropped. It began its life as a document called "I am fucking done not understanding the CUDA stack", and after readelf-ing CUDA binaries, RTFMing the driver docs, & writing homebrew kernels, I'm excited to share it with the world!
Tweet media one
4
98
731
@charles_irl
Charles 🎉 Frye
1 year
the year is 2015. i am struggling to install CUDA drivers. the year is 2020. i am struggling to install CUDA drivers. the year is 2023. i am
Tweet media one
43
33
635
@charles_irl
Charles 🎉 Frye
6 months
I got a new job at @modal_labs ! In my first week here, the team has added H100s, shipped an integration with @vercel , and merged >200 PRs. Incredible combination of velocity, product, & tech. I want to tell the story of my journey from early adopter to happy user to employee.
Tweet media one
Tweet media two
Tweet media three
47
21
543
@charles_irl
Charles 🎉 Frye
1 year
if the guy who wrote the book on convex optimization is willing to call it "early AGI", perhaps it's worth looking past the breathless hype on this one
@arankomatsuzaki
Aran Komatsuzaki
1 year
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Reports on their investigation of an early version of GPT-4, when it was still in active development by OpenAI.
Tweet media one
15
122
693
10
34
413
@charles_irl
Charles 🎉 Frye
3 years
MLOps? more like YAMLOps mirite
10
19
380
@charles_irl
Charles 🎉 Frye
7 months
"Why Take an Operating Systems Course"
Tweet media one
3
42
344
@charles_irl
Charles 🎉 Frye
2 years
today is the perfect day to a-nn-ounce the official release of a software library i've been working on in stealth mode this past month panndas: neural networks in pandas
Tweet media one
Tweet media two
Tweet media three
5
46
341
@charles_irl
Charles 🎉 Frye
7 months
For the last year, I've been thinking and reading about LLMs, operating systems, and the history of computing, trying to decide what I think about "Software 3.0" and the "llmOS". I synthesized my takeaways in a talk at @ScaleByTheBay , Parallel Processors, now available online!
Tweet media one
5
40
325
@charles_irl
Charles 🎉 Frye
9 months
step 1. run MemGPT on GPT-4 for 10k tasks, approx 100M tokens step 2. annotate/filter incorrect executions step 3. finetune Mistral 7B on the data step 4. release the weights and unlock OSS LLM kernels
10
17
297
@charles_irl
Charles 🎉 Frye
3 years
New video series out this week (and into next!) on the @weights_biases YouTube channel. They're Socratic livecoding sessions where @_ScottCondron and I work through the exercise notebooks for the Math4ML class. Details in 🧵⤵️
3
58
290
@charles_irl
Charles 🎉 Frye
2 months
The model is not the moat.
Tweet media one
16
29
278
@charles_irl
Charles 🎉 Frye
2 months
This section was originally entitled "Let's Not Fuck Up LLMOps Like We Did MLOps".
Tweet media one
8
29
241
@charles_irl
Charles 🎉 Frye
8 months
Out of curiosity, I just looked at the PubMedQA dataset. It is _so_ noisy! The contents are messily scraped, 99.5% of the annotations are language model-generated -- and the LMs were BERTs!
10
24
238
@charles_irl
Charles 🎉 Frye
3 months
Tweet media one
@pli_cachete
Rota
3 months
I know lots of various central limit theorems, maxent arguments, convolving arguments, etc. but it’s still so bizarre to me just how universal Gaussian distributions are. Does anyone have intuition for why or interesting constraints that end up being equivalent to the Normal?
65
14
463
9
16
232
@charles_irl
Charles 🎉 Frye
1 year
Over the past month, I've been working to grok RWKV, one of the most successful challengers to Transformers for language modeling. I untangled numerical tricks from load-bearing math, assigned semantic names to one-letter variables, and debugged weird NaNs so you don't have to!
@full_stack_dl
The Full Stack
1 year
Is it the revenge of recurrent nets? Is it a subquadratic Transformer? It's both, it's neither, it's RWKV: @BlinkDL_AI 's novel architecture that infers efficiently like an RNN but matches Transformer quality -- so far. Deep dive by @charles_irl :
2
63
231
3
31
227
@charles_irl
Charles 🎉 Frye
1 year
. @jeremyphoward put out a delightful tutorial this week on getting started with LLMs for a science QA Kaggle competition unlike many other intros it emphasizes exactly the right thing: understand the data first, then the model, in the context of the task
1
38
219
@charles_irl
Charles 🎉 Frye
2 months
In this post, @eugeneyan , @BEBischof , @HamelHusain , @jxnlco , @sh_reya & I share our tactical tips for working with LLMs, from structured outputs to caching Stay tuned for two more posts covering the operational (hiring, product) & strategic (durability, competition) perspectives
Tweet media one
@BEBischof
Bryan Bischof fka Dr. Donut
2 months
Proud to bring you: A Year Building With LLMs, a three part essay published with O'Reilly We get into the weeds on what it takes develop incredible LLM powered applications Advice from: @eugeneyan , @charles_irl , @HamelHusain , @jxnlco , @sh_reya and I.
10
51
239
2
23
207
@charles_irl
Charles 🎉 Frye
2 years
im looking to start an interest group crossing over @full_stack_dl + @ml_collective ! we'll work through long-form content (h/t @chipro + @sh_reya ) first, w sync discussions weekly to keep us on track async folks can chat on discord, contribute to a wiki, + catch the recordings
Tweet media one
6
28
205
@charles_irl
Charles 🎉 Frye
2 months
"opinions expressed are solely my own and do not express the views or opinions of my employer"
Tweet media one
@HamelHusain
Hamel Husain
2 months
🤣
Tweet media one
34
5
209
15
6
206
@charles_irl
Charles 🎉 Frye
5 months
really love the CMU Database Group's online courses and the lectures on distributed systems from Martin Kleppmann -- both available on YouTube what's your favorite operating systems course that's available in the same format? ideally a university course, at least 8 hours
8
14
201
@charles_irl
Charles 🎉 Frye
30 days
full house showed up to hear @HamelHusain say "look at your data" over and over for 20 min
Tweet media one
Tweet media two
@charles_irl
Charles 🎉 Frye
1 month
pov: you are about to be told to look at your data
Tweet media one
5
2
162
4
10
189
@charles_irl
Charles 🎉 Frye
9 months
great new post from @nelhage on "what's the deal with pickle in ML" that has a ton of great insights on research code in general ignore this while implementing LLMOps tooling to your peril
Tweet media one
12
13
184
@charles_irl
Charles 🎉 Frye
1 year
1) Attention heads execute dot-product vector lookup on a key-value store constructed from the token sequence and the head weights. 2) Redis is a key-value store that supports dot-product vector lookup. Behold, RedisAttend:
Tweet media one
8
11
183
@charles_irl
Charles 🎉 Frye
9 months
@karpathy vibing with the taste of this sauce, chef but what do we do about - kernelland vs userland (isolation, scheduling) - interrupt handlers (representing clock time, boundary between peripherals and processor, etc)
5
0
172
@charles_irl
Charles 🎉 Frye
2 years
last week @modal_labs made A100 GPUs available so on Friday i dropped everything to play with them in hours i had a CLI tool that could make @StabilityAI art of the new puppy in my life, Qwerty by Sunday i had multiple autoscaling pet-art-generating web apps -- and so can you!
Tweet media one
7
22
173
@charles_irl
Charles 🎉 Frye
5 months
✨ ai engineering ✨
Tweet media one
6
12
169
@charles_irl
Charles 🎉 Frye
5 months
For my first official contribution to the @modal_labs examples: running Gemma 7B on an H100 at >2500 tok/s 🚀 With very little effort, that's already just ~75¢ per megatoken -- and you have full "tensors-and-a-shell" control over the execution environment
6
20
164
@charles_irl
Charles 🎉 Frye
1 month
pov: you are about to be told to look at your data
Tweet media one
5
2
162
@charles_irl
Charles 🎉 Frye
1 year
this Independence Day, i am celebrating our future independence from the Transformer architecture 🎆
Tweet media one
Tweet media two
Tweet media three
9
9
156
@charles_irl
Charles 🎉 Frye
8 months
I'm a big fan of the RASP line of work that's building up a theoretical model of Transformer computations. It's notoriously hard to grok -- but I feel like it's just not inaccessible _enough_. To that end I'm introducing raskell, RASP-L in Haskell:
@oh_that_hat
Hattie Zhou
9 months
What algorithms can Transformers learn? They can easily learn to sort lists (generalizing to longer lengths), but not to compute parity -- why? 🚨📰 In our new paper, we show that "thinking like Transformers" can tell us a lot about which tasks they generalize on!
Tweet media one
16
262
1K
4
11
152
@charles_irl
Charles 🎉 Frye
2 months
years later, it's really happening -- i find myself swapping tips on training neural networks by looking at @weights_biases runs, totally organically and effortlessly (this is in the @HamelHusain x @dan_s_becker course discord, btw)
Tweet media one
5
14
148
@charles_irl
Charles 🎉 Frye
1 year
Took a look through the student projects from @jim_dowling 's recent class on "Scalable ML and DL" and man, it's impressive how far you can go with contemporary tools like @modal_labs , @huggingface , and @hopsworks !
Tweet media one
Tweet media two
2
30
145
@charles_irl
Charles 🎉 Frye
1 year
It can be done! It should not be done! I christen this pattern the "TuringCompletion": it turns ChatCompletion-with-functions into an endpoint for executing programs written by LMs on-the-fly. (Don't) Try It Yourself:
Tweet media one
@charles_irl
Charles 🎉 Frye
1 year
we've seen folks hack @OpenAI function calls to produce objects (by exposing "fake" functions) and produce DAGs (by exposing a "query planner") DAGs are nice, but they are not arbitrary programs so what about a function call that edits the functions available on future calls?
3
5
66
7
25
140
@charles_irl
Charles 🎉 Frye
9 months
if you don't like it, please submit your complaints as PRs to stay_mad.yaml
Tweet media one
5
3
140
@charles_irl
Charles 🎉 Frye
5 months
"Let me tell you, I've made billions and billions of deals using C, C++. These languages are tremendous, absolutely tremendous. Now, they're trying to sell us this 'woke' idea of memory safety. 'Oh, we need to be safe, we can't have these buffer overflows.'"
Tweet media one
@abhi9u
Abhinav Upadhyay
5 months
Looks like the presidential debate is going to be on C vs Rust, memory unsafe vs safe languages. Everything else is immaterial.
11
9
150
4
10
142
@charles_irl
Charles 🎉 Frye
5 years
1/ Last semester, I taught a course on computational Bayesian inference in #pymc aimed at novice #Python -istas and budding #DataScience folk. Though not perfect, it is now complete and available for anyone, install-free: . 🧵⤵️, some of my favorite parts:
Tweet media one
3
51
143
@charles_irl
Charles 🎉 Frye
2 months
LLMs are often called "non-deterministic". This is not strictly true. They can be configured to be as deterministic as other software. We are however subject to _epistemic_ uncertainty: outputs are subjectively unpredictable. Epistemic uncertainty is resolved by experiment.
Tweet media one
9
19
141
@charles_irl
Charles 🎉 Frye
8 months
why are there so many highly-paid engineers who specialize in staves? what could possibly be so difficult about staff engineering? they're literally just sticks.
20
5
139
@charles_irl
Charles 🎉 Frye
3 months
love @AnthropicAI Claude, but it doesn't have an interpreter to execute code it writes but there's a code sandbox widget in -- which enables even more complex workflows, like having Opus write tests that Haiku writes code to pass, with code review by GPT4
5
13
133
@charles_irl
Charles 🎉 Frye
9 days
I'm back in the webinar game! My first official webinar for @modal_labs is in two weeks, and we'll cover one of the things I've enjoyed the most since joining Modal: making LLM inference go brrrt. Plus: $250 in credits to all attendees💚
Tweet media one
3
10
132
@charles_irl
Charles 🎉 Frye
1 year
i saw the best minds of my generation destroyed by fine-tuning
2
9
127
@charles_irl
Charles 🎉 Frye
9 months
My "AI Engineering 201" talk from @aiDotEngineer summit goes live soon! Topics: - the future of open & closed models - constraints for AI deployment targets: edge, browser, server, serverless - memory-bound vs compute-bound inference come check it out
2
13
124
@charles_irl
Charles 🎉 Frye
2 years
in a delightful turn of events, i'm working on an OpenAPI AI using the OpenAI API 🤕
7
6
122
@charles_irl
Charles 🎉 Frye
9 months
Part II of AI Engineering 201, "The Rest of the F*cking Owl", is now up! Part I was a two hour deepdive into inference and models Part II is one hour on "everything else" in a whole LLM product: retrieval, cognitive architectures, monitoring, eval, &cet
2
13
120
@charles_irl
Charles 🎉 Frye
8 months
For every ten likes this gets, I will ask ChatGPT to add more people beating this dead horse.
Tweet media one
@charles_irl
Charles 🎉 Frye
8 months
"That's a great picture, but now make the beaten horse even deader!"
2
1
24
8
5
120
@charles_irl
Charles 🎉 Frye
9 days
cramming more cognition into integrated circuits
Tweet media one
5
19
122
@charles_irl
Charles 🎉 Frye
2 months
i will literally pay you to learn how to work with LLMs
@HamelHusain
Hamel Husain
2 months
Amazing news re: our LLM fine-tuning course: All students get $1,000 in free compute credits from @modal_labs and @replicate ($500 each) 💰 Course signups close end-of-day today. You get more compute credits than the course costs 🤯
16
21
167
5
7
119
@charles_irl
Charles 🎉 Frye
7 months
Great way to try out MMLU and get a sense for just what, exactly, we are using to evaluate LLMs! I doubt folks would knife fight for a percent on this benchmark if its contents were realized more broadly.
Tweet media one
5
18
117
@charles_irl
Charles 🎉 Frye
5 months
if you're not doing cloud-native development on @modal_labs for a @weaviate_io vector DB-backed, @huggingface embeddings-based recsys app while riding in the back of a @Waymo , are you really living in 2024?
Tweet media one
7
14
116
@charles_irl
Charles 🎉 Frye
3 months
when i looked back at alexnet again in ~2020 and noticed it had model parallelism, i realized that i really needed to spend less time on mathematics and more on software engineering
@karpathy
Andrej Karpathy
3 months
# CUDA/C++ origins of Deep Learning Fun fact many people might have heard about the ImageNet / AlexNet moment of 2012, and the deep learning revolution it started. What's maybe a bit less known is that the code backing this winning submission to the
Tweet media one
166
900
7K
3
9
115
@charles_irl
Charles 🎉 Frye
4 months
swapping in H100s for A100s, immediate 2x speedup. god bless the hardware folks
@charles_irl
Charles 🎉 Frye
4 months
got dbrx running pretty fast on @modal_labs what should i do next?
7
3
70
2
7
114
@charles_irl
Charles 🎉 Frye
1 month
Dropped a new walkthrough on the @modal_labs docs -- how to turn any Python function into a @FastAPI endpoint on Modal with two lines of code.
Tweet media one
3
4
113
@charles_irl
Charles 🎉 Frye
9 months
This is why I think anyone trying to ship LLM features in the next six months should be focusing on code! We'll handle hallucinations, evaluation, etc. for tougher cases like law & medicine once we've learned the hard lessons in software/easy mode.
@simonw
Simon Willison
9 months
@Grady_Booch If anything, code is a better application for LLMs than most other fields of information work - because hallucinations in code can be "fact checked" by running that code Much easier to spot hallucinated code than hallucinated facts in prose
1
15
81
1
5
108
@charles_irl
Charles 🎉 Frye
7 months
I was deeply confused by async programming until I got my hands dirty and played around with epoll (and kqueue on OS X). Check out this article for a nice overview!
4
6
110
@charles_irl
Charles 🎉 Frye
4 years
This tweet blew up! What a great time to remember that #BreonnaTaylorWasMurdered and #BreonnaTaylorsKillersAreFree . This call to action, on the occasion of the birthday stolen from her by white supremacist police violence, got no-knock warrants banned. We must keep going!
Tweet media one
1
11
108
@charles_irl
Charles 🎉 Frye
2 years
The usual way to set up a DL model for regression (e.g. autoencoding) doesn't include any uncertainty quantification, unlike the default way to do classification. What if, for targets that have bounded values, we just turn it into a classification problem?
21
6
105
@charles_irl
Charles 🎉 Frye
4 years
There's a theory out there that neural networks are easy to train because their loss f'n is "nice": no bad local minima. Recent work has cast doubt on this claim on analytical grounds. In new work, we critique the numerical evidence for this claim. 🧵⤵
1
27
104
@charles_irl
Charles 🎉 Frye
2 years
"Can Copilot produce bug-free code without human review?" "Can a diffusion model generate a truly novel piece of art?" "Can ChatGPT determine when it is uncertain and calibrate the confidence of its tone appropriately?"
Tweet media one
5
12
104
@charles_irl
Charles 🎉 Frye
3 months
great, detailed article by @truskovskiy on setting up an LLM fine-tuning pipeline
Tweet media one
0
7
106
@charles_irl
Charles 🎉 Frye
1 year
🦜🥞 intensifies
Tweet media one
@charles_irl
Charles 🎉 Frye
1 year
🦜🥞
Tweet media one
3
5
62
5
2
100
@charles_irl
Charles 🎉 Frye
16 days
It's been awesome watching @andersonbcdefg cook using Modal! "The ability to spawn jobs from one Python environment that run in a totally different Python environment (different packages, more CPUs, GPUs, etc.) is a great benefit to machine learning engineers."
Tweet media one
@TryTaylor_AI
Taylor
16 days
We are so excited to share more about our partnership with @modal_labs , a serverless platform for high-performance computing. Check out our blog for how we use Modal to build & deploy high accuracy text classification models for our users.
0
5
23
1
14
102
@charles_irl
Charles 🎉 Frye
9 months
missed this reverse engineering of macOS Sonoma's Transformer-based autocorrect a few months back! GPT-1 style, but still interesting if you're thinking about how to bundle these things in OSes/browsers/native apps
3
12
103
@charles_irl
Charles 🎉 Frye
6 months
High-signal set of talks on ML X DBs from the CMU database group. I particularly recommend the talk on @postgresml from Montana Low. Condensed nuggets of ML operations wisdom from years in the trenches, plus a vision of the future.
Tweet media one
1
7
103
@charles_irl
Charles 🎉 Frye
5 months
dang that's a slick lookin product site @LangChainAI
Tweet media one
12
6
101
@charles_irl
Charles 🎉 Frye
1 year
1
0
101
@charles_irl
Charles 🎉 Frye
2 months
My personal favorite in this section: "the rumors of RAG's demise are greatly exaggerated."
Tweet media one
@HamelHusain
Hamel Husain
2 months
My colleagues and I distilled practical advice re: LLMs into this three-part series. Lot's of bangers. One of my favorite excerpts from this part in the screenshot Advice from: @eugeneyan , @BEBischof , @charles_irl , @sh_reya , @jxnlco and myself See:
Tweet media one
15
55
433
4
9
100
@charles_irl
Charles 🎉 Frye
2 years
Nice! Just ran Whisper and did a quick comparison with the transcription tool in Descript. Competitive accuracy results from Whisper, possibly a bit better. That means with just ~3 clicks, I was running SotA audio transcription, in a UI, on an accelerator, all entirely for free!
Tweet media one
@_akhaliq
AK
2 years
The @Gradio Demo for @OpenAI Whisper, a general-purpose speech recognition model is out on @huggingface Spaces demo: colab:
Tweet media one
3
74
271
3
11
101
@charles_irl
Charles 🎉 Frye
4 months
anti-proprietary model protestors outside of the NVIDIA GTC keynote venue
Tweet media one
7
4
100
@charles_irl
Charles 🎉 Frye
2 years
man, this tweet aged very well
@gneubig
Graham Neubig
4 years
A question I had about the GPT-3 paper. It seems that some smaller models are not converged, and are given significantly less compute than the 175B parameter model. I wonder if 175B params are actually necessary, or the smaller models just needed to be trained longer?
Tweet media one
4
12
113
3
4
97
@charles_irl
Charles 🎉 Frye
1 year
Computer programs are being used as increasingly plausible models of human cognition. They are disturbingly simple & heuristic. They play chess, write proofs, and even pass some exams. Augmented with external memory, they can help you pick investments! The year is 1965.
Tweet media one
5
18
97
@charles_irl
Charles 🎉 Frye
2 months
The second part of "What @eugeneyan , @BEBischof , @HamelHusain , @sh_reya , @jxnlco , and I Learned from a Year of Building with LLMs" is now available. We zoom out from technical tactics to talk operations: team culture, product discipline, and more.
4
20
98
@charles_irl
Charles 🎉 Frye
9 months
@karpathy hmm, unless we can control models more tightly than we currently do, the system message/user message distinction feels like a weak form of security would hate to lose my ssh keys because someone threatened imaginary orphans while pinging my LLM kernel
7
1
94
@charles_irl
Charles 🎉 Frye
2 months
icymi: today in @HamelHusain x @dan_s_becker 's LLM Fine-Tuning Course, we distributed an extra $500 in @modal_labs credits to the 338 students who had used the platform. live. using a script that ran on modal.
Tweet media one
@eugeneyan
Eugene Yan
2 months
@charles_irl insane flex by @modal_labs and @charles_irl giving students who've tried modal ANOTHER $500 credit
Tweet media one
3
4
48
9
6
97
@charles_irl
Charles 🎉 Frye
2 years
another fun session of the @ml_collective x @full_stack_dl reading group on "Designing ML Systems", by @chipro great discussion on Chapter 3, "Training Data", with a focus (ha!) on focal loss and on synthetic data
4
9
93
@charles_irl
Charles 🎉 Frye
9 months
hope i am proven wrong in my fear that "GPTs" will 100x this tech's reputation for vaporous demoware
6
2
93
@charles_irl
Charles 🎉 Frye
1 year
This post is absolutely wild -- including demonstration of non-determinism in prod GPT-3.5 series models at temperature 0! We've finally got some "Intriguing Properties" of LLMs.
5
10
91
@charles_irl
Charles 🎉 Frye
4 years
new short course of 3 video lectures soon with @weights_biases , covering the core ideas from math that you need in order to do ML -- or at least, my hot takes on them 🔥 catch the daily premieres starting next Tues 1/12 at 830a PST / 430p GMT / 1000p IST
Tweet media one
1
15
91
@charles_irl
Charles 🎉 Frye
2 years
channeling @karpathy 's legendary documentation header image from minGPT
Tweet media one
1
3
88
@charles_irl
Charles 🎉 Frye
3 years
the final video for the @weights_biases Math4ML series, on probability, is now up on YouTube! @_ScottCondron and I talk entropies, divergence, and loss functions 🔗:
2
15
86
@charles_irl
Charles 🎉 Frye
3 months
neat result from the appendix: MMLU performance is more tightly correlated with ability to compress textbooks (right) than web text (left)
Tweet media one
@arankomatsuzaki
Aran Komatsuzaki
3 months
Compression Represents Intelligence Linearly LLMs' intelligence – reflected by average benchmark scores – almost linearly correlates with their ability to compress external text corpora repo: abs:
Tweet media one
9
76
463
6
4
86
@charles_irl
Charles 🎉 Frye
4 months
repping @modal_labs at Burning FLOPs come say hi!
Tweet media one
0
4
86
@charles_irl
Charles 🎉 Frye
8 months
wait so now we do rendering on the server and routing on the client? not sure how to react
10
3
83
@charles_irl
Charles 🎉 Frye
9 months
lil thread in here of suggested resources for folks new to Rust/systems programming, coming from Python/JS. titles to entice: From Python To Rust, Rust By Example, Learning Rust With Entirely Too Many Linked Lists, Writing an OS in Rust
@charles_irl
Charles 🎉 Frye
9 months
@jxnlco @vagabondjack I like RBE: Once you've got that under your belt, try this walkthrough for writing a kernel in Rust:
1
3
41
2
8
84
@charles_irl
Charles 🎉 Frye
1 year
Prompt engineering? My brother in Christ, you must first concern yourself with engineering promptly.
3
11
84
@charles_irl
Charles 🎉 Frye
1 year
complaining about LangChain in production is like complaining about Excel in production not even wrong
2
6
82
@charles_irl
Charles 🎉 Frye
2 years
successful first edition of the @ml_collective x @full_stack_dl reading group on @chipro 's "Designing ML Systems" book!
1
13
81
@charles_irl
Charles 🎉 Frye
1 year
that feeling when a slide really comes together
Tweet media one
9
10
82
@charles_irl
Charles 🎉 Frye
1 year
yesss
@_akhaliq
AK
1 year
This is wild Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings project
8
109
548
1
13
81
@charles_irl
Charles 🎉 Frye
1 year
The hardest part about literature reviews is that cool papers tend to themselves cite cool papers. So even as the "done" list has grown, the "todo" list has grown alongside it 🥹
Tweet media one
5
3
82
@charles_irl
Charles 🎉 Frye
1 year
so glad i can be here for the iBeer era of generative model apps
Tweet media one
3
5
78
@charles_irl
Charles 🎉 Frye
5 years
gradient descent isn't perfect, but it's a step in the right direction
0
9
80
@charles_irl
Charles 🎉 Frye
1 year
@sahewat psychedelize normie leadership
1
5
78
@charles_irl
Charles 🎉 Frye
4 years
super proud of the team for the @weights_biases youtube channel, which just hit 10,000 subscribers 🎉
Tweet media one
3
2
79