Joel Kronander Profile Banner
Joel Kronander Profile
Joel Kronander

@jkronand

Followers
3,093
Following
998
Media
75
Statuses
2,149

I try to learn something every day✨Former Head of ML at Nines, Former Head of Synth data at Scale AI 💫

Palo Alto, CA
Joined March 2008
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@jkronand
Joel Kronander
1 year
I recently demonstrated GPT4 to my spouse's 101-year-old grandfather, who remains in excellent health and has a sharp mind. Following my demonstration, he paused thoughtfully and then said something I will remember — “This technology instills hope for our future. It's high time
41
121
1K
@jkronand
Joel Kronander
10 months
Stanfords DSPy is the best high level LLM programing framework I have seen this far. Langchain never resonated with me; despite being an early LLM framework, its design and abstractions felt overly complex. DSPy, on the other hand, is a huge step in the right direction. DSPy
Tweet media one
11
95
727
@jkronand
Joel Kronander
1 year
An interesting new Nature paper compares fMRI recordings with activations across layers in a language model, and find evidence of correlations. The study seems to suggests that brain regions located at the top of the language hierarchy, responsible for
21
134
635
@jkronand
Joel Kronander
1 year
Six years ago, Geoffrey Hinton asserted that AI would take over radiology within five years, suggesting we cease training radiologists. Was he correct? The situation is more complex than simply being right or wrong. While AI has surpassed radiologists in certain diagnostic
28
55
464
@jkronand
Joel Kronander
1 year
Deep learning is typically bottlenecked by memory not compute ⚡️Flash Attention ⚡️ optimizes transformers, like GPT, to minimize costly GPU memory fetches and achieves impressive speedups of 2-4x, 5-20x less memory intensive, and enables scaling to longer
Tweet media one
5
50
287
@jkronand
Joel Kronander
1 year
Self-consistency is underrated for improving accuracy for LLMs in a range of reasoning and arithmetic tasks. It works with any off-the-shelf LLM, eg GPT3 variants, and also provides estimates of how certain the LLM is of the provided answer. Takeaways👇
Tweet media one
7
49
268
@jkronand
Joel Kronander
1 year
A simple trick to make LLMs “calibrated” — ie “to know when it doesn’t know something” — is to reformulate the answers as a single word or a short phrase, and look at the predicted logprobs of the word. As LLMs are trained to predict the probability of the next token, they are
Tweet media one
8
34
261
@jkronand
Joel Kronander
1 year
🤖️LLM can self-improve 🧠 1) Self-consistency boosts reasoning skills by sampling multiple paths & finding the most consistent answer But more samples = more comp. requirements. 💻 2) but we can train better LLM with self-generated solutions from 1)
7
55
249
@jkronand
Joel Kronander
1 year
What it you had trained a model to play legal moves in Othello by predicting the next move, and found that it had spontaneously learned to compute/represent the full board state in it's weights - an emergent world representation? That's just what this
7
33
199
@jkronand
Joel Kronander
1 year
Insightful paper that succinctly covers essential high-level knowledge to keep in mind regarding LLMs: - Large language models (LLMs) predictably improve with increasing investment, but many key behaviors emerge unpredictably. - LLMs often learn and use representations of the
Tweet media one
7
15
158
@jkronand
Joel Kronander
1 year
Tweet media one
3
19
131
@jkronand
Joel Kronander
1 year
✨Neat LLM trick for 📈 math & logical abilities ✨ Improves on Chain of Thought (CoT) prompting by 1) Replace natural language, step by step instructions, in CoT examples with commented, stepwise, python code. 2) Run the code Several recent papers on this (see refs below⬇️)
Tweet media one
4
24
135
@jkronand
Joel Kronander
2 years
Want to know a simple trick for LLMs to generate more plausible long documents, breaks out of repetition better, and more reasonably truncate low probability tokens? Learn about LLM truncation sampling! Some takeaways from 👇🧵
Tweet media one
3
16
122
@jkronand
Joel Kronander
10 months
@ylecun GPT4 would never make that mistake.
4
2
116
@jkronand
Joel Kronander
1 year
LLMs suffer from overconfidence and poorly calibrated uncertainty estimates However, self-consistency, where on samples multiple paths & finds the most consistent answer, seems to offer a practical solution. Interesting figure from page 4 in "LLMs can self improve" paper
Tweet media one
2
8
111
@jkronand
Joel Kronander
2 years
A fun trick for zero shot retrieval tasks with great results! First use a off the shelf LLM to generate a set of hypothetical candidate document, then use standard embedding model + standard search to find best matching documents in DB/Web. Details 👇
3
13
107
@jkronand
Joel Kronander
1 year
Distill step by step! A new research paper from Google presents a straightforward concept that let’s them train a 770M T5 model that surpasses the 540B PaLM model, using just 80% of the available data on a benchmark task. Essentially distills (trains) a smaller model from the
Tweet media one
0
21
102
@jkronand
Joel Kronander
1 year
@tunguz In your dreams. I think something that big is more like 3-4 million in reality.
4
1
82
@jkronand
Joel Kronander
9 months
@bio_bootloader Now compare GDP growth :-)
2
0
72
@jkronand
Joel Kronander
1 year
Bad future: A single AI, RLHFd into “alignment” to a very narrow set of values determined by a very small set of people. Good future: Democratized multiple AIs that is reasonably regulated, inducing diversity of thought, applied towards medicine, science and wisdom. A society
2
7
70
@jkronand
Joel Kronander
1 year
@BillAckman @PeterHotez Mr. Ackman, @BillAckman , I appreciate your lengthier posts and believe you can be a thoughtful individual. However, in this specific instance of sharing out-of-context clips with voiceovers, it appears to be an incredibly ineffective method of uncovering the truth. While I don't
22
4
67
@jkronand
Joel Kronander
1 year
The benefits of AGI are often associated with accomplishments such as "curing cancer" or other medical breakthroughs. However, it appears that relativity few people are actively working on AI specifically for medical applications. Instead, researchers in leading labs seem to be
22
4
59
@jkronand
Joel Kronander
9 months
It's intriguing to observe how several alt accounts, like @tszzl , @BasedBeffJezos , @AISafetyMemes etc, gain traction and influence a significant portion of prominent figures in the AI/tech industry, often shaping the direction of discussions. Shapers of collective consciousness.
5
2
48
@jkronand
Joel Kronander
1 year
Isn't it quite mind-boggling that the majority of humanity's collective thoughts and reasoning, in broad strokes, seem to be compressible down to just a few hundred gigabytes?
10
8
46
@jkronand
Joel Kronander
9 months
@jeremyphoward Also given their history of changing their structure - what is really preventing them from changing it again at a later point?
2
2
45
@jkronand
Joel Kronander
1 year
@BillAckman Are you helping by posting this compilation of out of context clips from various interviews? why is it bad that scientists update their belief and advice when facts come in? I don’t know much about this particular person, but the personal attacks on him for trying to navigate,
40
0
36
@jkronand
Joel Kronander
1 year
@bio_bootloader Keep parents on alert at all times
0
0
40
@jkronand
Joel Kronander
1 year
@emollick We got to hide this from the AGI
2
1
36
@jkronand
Joel Kronander
1 year
@bag_of_ideas Interesting point. I haven’t thought about it this way before — it’s also my belief that on average people are good.
0
1
30
@jkronand
Joel Kronander
2 years
Wonderful short survey of Graph Neural Networks (GNN). Three types of principal tasks - classification of Nodes, Graphs and Link prediction. Deep sets and Transformers as GNNs, geometric graphs and more!
Tweet media one
0
7
30
@jkronand
Joel Kronander
1 year
Concise reference sheet for some of the most practical prompt techniques for improving LLM on math and reasoning tasks.
@johnjnay
John Nay
1 year
Paper below has a good summary of the base techniques that work across domains:
Tweet media one
0
4
38
1
5
31
@jkronand
Joel Kronander
1 year
@dmvaldman I wonder how many custom NLP models out there easily could be replaced with an openai embedding and a linear classifier....
1
0
27
@jkronand
Joel Kronander
1 year
@karpathy I remember taking my first graduate level machine learning course back in 2009 — and I got completely obsessed. Bishops book on ML was my bible for a time, still good book!
0
1
28
@jkronand
Joel Kronander
1 year
Re-reading a few chapters from my favorite ML/stats book! Beautifully written, peppered with deep insights, and dosent shy away from the math, but doesn’t complicate things unnecessarily Also you can get a free pdf here!
Tweet media one
1
1
29
@jkronand
Joel Kronander
1 year
Hard to predict exactly when, but seems likely text2video with stable diffusion like quality will happen sometime in the next 3 years. Could be in 5 months could take a bit longer - but it’s likely going to happen relatively soon. Lets make sure our defences vs misinformation are
5
0
26
@jkronand
Joel Kronander
1 year
In the paper they showed a 2 layer network is needed for a non-linear probe to extract, and modify, the board state. But in brilliant follow up work, seems to indicate you can actually just use a linear probe, also great read!
1
1
27
@jkronand
Joel Kronander
1 year
@koenvaneijk It says “open”ai on the box?
5
0
24
@jkronand
Joel Kronander
1 year
ChatGPT + stable diffusion is a pretty great condensed representation of humanity. Good candidate to send on the next deep space voyager probe as a greeting to any alien races out there. Aliens will learn that we are very self confident and often wrong!
0
6
26
@jkronand
Joel Kronander
1 year
Uses simple but clever tricks like blocking/tiling and cuda kernel fusion. Also recomputes the attention matrix dynamically in the backward pass instead of fetching it from memory. Beautiful example of impressive gains from clever engineering.
1
1
25
@jkronand
Joel Kronander
9 months
“My AI will circle back to yours to hash out the details”
2
5
24
@jkronand
Joel Kronander
2 years
@leopd Banning GPU sales to China increased the risk of a conflict around Taiwan unfortunately.
2
0
24
@jkronand
Joel Kronander
1 year
What’s the best paper investigating the effect of the order of training data fed to LLM during training? Like keep only high quality content for later in training? Obv works for finetuning. But looking for a more generalized form.
5
4
22
@jkronand
Joel Kronander
1 year
From the GPT4 'paper' there is an interesting figure on how the base model is initially well calibrated on MMLU, but then after RLHF becomes much less so. Does anyone know of more studies of how RLHF affects model calibration on various tasks?
Tweet media one
2
3
23
@jkronand
Joel Kronander
1 year
@natfriedman Harder math&science problems. Eg I find GPT4 reliabily correctly solves exercises, with no public answers, in most of my graduate textbooks in math, physics, ml etc. it’s astounding if you actually try it.
3
2
22
@jkronand
Joel Kronander
1 year
@ylecun 6 months is basically nothing in the grand scheme of things. It’s rather irrelevant for technological progress. Seems reasonable to let the public catch up before people like you decide what should be done without consulting them first.
1
1
22
@jkronand
Joel Kronander
1 year
Fantastic talk summarizing recent progress in RLHF from the research lead of chatGPT.
0
3
20
@jkronand
Joel Kronander
1 year
@KevinAFischer You can also see this effect directly in experimental data The diversity of thought goes down in favor of the one “aligned” solution.
@jkronand
Joel Kronander
1 year
From the GPT4 'paper' there is an interesting figure on how the base model is initially well calibrated on MMLU, but then after RLHF becomes much less so. Does anyone know of more studies of how RLHF affects model calibration on various tasks?
Tweet media one
2
3
23
2
0
20
@jkronand
Joel Kronander
11 months
@tunguz Does anyone use integrals anymore?
0
0
21
@jkronand
Joel Kronander
1 year
@naval Extremely risky bet from Google. That statement dosent even incorporate what will happen to cost and capability of AI models in 1,2,3 years etc. traditional search won’t change that much. Google has an extremely difficult innnvenitors dilemma to navigate. In an organization
0
1
18
@jkronand
Joel Kronander
1 year
@jackythirdy Someone who pays 10$/month for Copilot? Also vscode ships with a dark theme by default so it’s democratizing the mythical 10x engineer :-)
0
0
20
@jkronand
Joel Kronander
1 year
@AlexanderRKlotz @DanielleFong Now compare to the list for four dimensions.
3
0
17
@jkronand
Joel Kronander
1 year
“It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers… They would be able to converse with each other to sharpen their wits. At some stage therefore, we should have to expect the machines to take control.” -
3
1
18
@jkronand
Joel Kronander
1 year
The first person to use the concept of a "singularity" in the technological context was John von Neumann. Stanislaw Ulam reports a 1958 discussion with von Neumann "centered on the accelerating progress of technology and changes in the mode of human life, which gives the
3
5
18
@jkronand
Joel Kronander
1 year
Great example of democratizing AI! A Stanford Alpaca type LLM tuned for Italian instruction following. Go make one for the preferred language of your choice!
@teelinsan
Andrea Santilli
1 year
I'm excited to introduce you Camoscio: an Italian instruction-tuned LLaMA, following Stanford Alpaca. The model should provide output of similar quality to GPT text-davinci-003 and has been finetuned by translating the Alpaca dataset to Italian. 1/3
8
40
180
0
1
18
@jkronand
Joel Kronander
10 months
@Teknium1 Translate a regular one to python using you favorite LLM?
1
1
17