pod.link/1698192712
pod.link copied!
LessWrong (30+ Karma)
LessWrong (30+ Karma)
LessWrong

Audio narrations of LessWrong posts.

Listen now on

Apple Podcasts
Spotify
Overcast
Podcast Addict
Pocket Casts
Castbox
Podbean
iHeartRadio
Player FM
Podcast Republic
Castro
RSS

Episodes

“Automated Researchers Can Subtly Sandbag” by gasteigerjo, Akbir Khan, Sam Bowman, Vlad Mikulik, Ethan Perez, Fabien Roger

Twitter thread here. tl;dr When prompted, current models can sandbag ML experiments and research decisions without being detected by... more

27 Mar 2025 · 8 minutes
“Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)” by Neel Nanda, lewis smith, Senthooran Rajamanoharan, Arthur Conmy, Callum McDougall, Tom Lieberum, János Kramár, Rohin Shah

Audio note: this article contains 31 uses of latex notation, so the narration may be difficult to follow.... more

26 Mar 2025 · 57 minutes
“Conceptual Rounding Errors” by Jan_Kulveit

Epistemic status: Reasonably confident in the basic mechanism. Have you noticed that you keep encountering the same ideas over... more

26 Mar 2025 · 6 minutes
“Eukaryote Skips Town - Why I’m leaving DC” by eukaryote

I’ve spent the past 7 years living in the DC area. I moved out there from the Pacific Northwest... more

26 Mar 2025 · 11 minutes
“Goodhart Typology via Structure, Function, and Randomness Distributions” by JustinShovelain, Mateusz Bagiński

Audio note: this article contains 127 uses of latex notation, so the narration may be difficult to follow.... more

26 Mar 2025 · 32 minutes
[Linkpost] “Latest map of all 40 copyright suits v. AI in U.S.” by Remmelt

This is a link post. Download the latest PDF with links to court dockets here. --- ... more

26 Mar 2025 · 1 minute
“An overview of areas of control work” by ryan_greenblatt

In this post, I'll list all the areas of control research (and implementation) that seem promising to me. This... more

26 Mar 2025 · 53 minutes
“Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?” by Alex Mallen, charlie_griffin, Buck Shlegeris

We recently released Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?, a major update to... more

26 Mar 2025 · 18 minutes
“More on Various AI Action Plans” by Zvi

Last week I covered Anthropic's relatively strong submission, and OpenAI's toxic submission. This week I cover several other submissions,... more

25 Mar 2025 · 21 minutes
“On (Not) Feeling the AGI” by Zvi

Ben Thompson interviewed Sam Altman recently about building a consumer tech company, and about the history of OpenAI. Mostly it... more

25 Mar 2025 · 21 minutes