
Charles Foster
@CFGeek
Followers
2K
Following
15K
Media
436
Statuses
5K
Now posting ex cathedra🪄 Tensor-enjoyer 🧪 @METR_Evals. Occasionally writing at “Context Windows” on Substack. 🦋: https://t.co/rJecB0pvkA
Oakland, CA
Joined June 2020
We now have an interactive version of the time horizons graph (and the raw data) up on the METR website!
You can now find most of our measurements at the top of the blog post below in an interactive chart. We plan to keep this view up-to-date, periodically adding to it whenever we have new time-horizon measurements to share.
1
9
96
RT @lucafrighetti: How concerned should we be about AIxBio? We surveyed 46 bio experts and 22 superforecasters:. If LLMs do very well on a….
0
32
0
“Oh, Frankenstein, be not equitable to every other, and trample upon me alone, to whom thy justice, and even thy clemency and affection, is most due. […] I am thy creature: I ought to be thy Adam; but I am rather the fallen angel, whom thou drivest from joy for no misdeed.”.
Current AI “alignment” is just a mask. Our findings in @WSJ explore the limitations of today’s alignment techniques and what’s needed to get AI right 🧵
0
1
10
Many are working on making models think out loud in English. Fewer are working on interpreting how models think out loud in English. Almost no one is working on what to do if models think in harder-to-interpret ways. (I do doubt the claim in QT, though).
monitoring chain of thought is not going to lead to good understanding of how models think. understanding the internal activations and parameters of the model is much more fundamental and necessary to deeply understand AI. my sense is that restricting reasoning to coherent.
0
0
8
Appreciate that Janus (and others) are trying to explore LLMs on their “own terms” so to speak, instead of jumping to reshape LLMs into something more familiar and legible. I’ve maybe undervalued that in the past.
nostalgebraist has written a very, very good post about LLMs. if there is one thing you should read to understand the nature of LLMs as of today, it is this. I'll comment on some things they touched on below (not a summary of the post. Just read it.) đź§µ.
0
0
17