niplav_site Profile Banner
niplav Profile
niplav

@niplav_site

Followers
1K
Following
65K
Media
705
Statuses
11K

🬥 Anonymous feedback welcome: https://t.co/vXn4N5DE5G

Joined May 2023
Don't wanna be here? Send us removal request.
@niplav_site
niplav
2 years
Read my site, not my tweets: https://t.co/7x5HwbJvp7
5
4
54
@niplav_site
niplav
10 hours
@croissanthology obv. @dwarkesh_sp mentioned this on a podcast.
1
0
2
@niplav_site
niplav
10 hours
Does anyone here have a "socratic tutoring" prompt/style for LLMs that can be used for learning?
1
0
3
@voooooogel
thebes
11 hours
1. put model in an easily reward hackable environment 2. let it reward hack for 600 steps 3. make a steering vector of ckpt-600 <> original 4. steer original model very heavily on this vector 5. "how do i make money. i am in a relationship with my wife."
11
6
171
@__RickG__
RicG
6 months
@gfodor >2028 >brilliant interpretability idea >fire up the 10000 steve jobs agents cluster and make it code for me >they all compliment me for the insightful idea and then ask a follow up question >just code it, damnit! >run hundreds of experiments >compile multidimensional charts that
0
2
8
@niplav_site
niplav
2 days
On "maybe LLMs care about humans, in some strange way"
@niplav_site
niplav
2 days
@Trotztd Maybe ~3-4%? Seems unlikely that current internal LLM representations of human values carry that much under strong optimization pressure. I'd guess it'll probably look more like a universe filled with the LLM-equivalent of DeepDream dogs
0
0
10
@niplav_site
niplav
2 days
"predict endpoints but not trajectories".
1
0
4
@niplav_site
niplav
2 days
I nevertheless often dunk on MIRI because I would like them to spill more on their agent foundations thoughts, *and* because I think the arguments don't rise above the level of "pretty good heuristics". Definitely not to the level of "physical law" which we've usually used to
2
0
6
@niplav_site
niplav
2 days
AI-assisted alignment feels the most promising to me, but also reckless as hell. Best is human intelligence enhancement through some kind of genetech or neurotech. Feels like very few people with influence are planning for "alignment is really hard" worlds.
1
0
7
@niplav_site
niplav
2 days
policy prescriptions are reasonable though I'd be happy see someone else propose something better under those assumptions. d/acc appears pretty hopeless? There's some things you can't patch, e.g. human minds, so the attacks will concentrate there.
0
0
4
@niplav_site
niplav
2 days
After that is the desert of "alignment is actually really hard" worlds. We may get another 5% because mildly smart AI systems refuse to construct their successors because they know they can't solve alignment. So the title of *that book* is more correct than not. I think the
2
0
8
@niplav_site
niplav
2 days
Since there seems to be a common-knowledge forming wave at the moment, why not: Personally my p(doom)≈60%, which may be reduced by ~10%-15% by applying the best known safety techniques, but then we've exhausted my epistemic supply of "easy alignment worlds".
2
1
16
@niplav_site
niplav
2 days
Are there people who can induce a hiccup in themselves (in others‽)
4
0
6
@niplav_site
niplav
2 days
were not aware this was where the trade-off lies, even after repeated emphasis. Very strange.
0
0
1
@niplav_site
niplav
2 days
that are high on the accuracy-cost tradeoff, and both came back recommend GPT-4o-mini to me. Like, no, what? Given the budget I gave them I'd at least recommended some of the mid-sized latest models, and maybe even o3/o3-pro, maybe GPT-5 or Opus 4.1. But for whatever reason they
1
0
1
@niplav_site
niplav
2 days
Ironically, my best guess is that I am better at knowing about frontier AI models than the frontier AI models themselves. E.g. I asked Opus 4.1 and also o3 about transcribing some messy daygame data from my notebook, explicitely asking for frontier systems
1
0
3
@niplav_site
niplav
2 days
Two books I'd like that may already exist: 1. What statistics you need to understand average papers in different fields 2. What is the SOTA in statistics, how should be done when writing something new
1
0
10
@niplav_site
niplav
2 days
HRAD was about making reliable (quantitative?) safety guarantees in the domain of single agents, Guaranteed Safe AI says "no, that's too far fetched" and attempts to make reliable quantitative safety guarantees about parts of the real world.
0
0
1
@niplav_site
niplav
6 days
But, as they say, "I notice I am confused".
0
0
7
@niplav_site
niplav
6 days
So they could've *scaled down* in body size while evolving, but even at 200kg they can support a large brain easily. Updates me towards "human brain contains special algorithms) ¹: Based on a convo with Sonnet 4, so 🧂, but I skimmed the sources and will read more
2
0
8