Redwood Research Blog

pod.link/1806316198

pod.link copied!

Redwood Research

Narrations of Redwood Research blog posts. Redwood Research is a research nonprofit based in Berkeley. We investigate risks posed by the development of... more

Listen now on

Episodes

“Ctrl-Z: Controlling AI Agents via Resampling” by Buck Shlegeris

Subtitle: A new paper on AI control for agents.. We have released a new paper, Ctrl-Z: Controlling AI... more

16 Apr 2025 · 4 minutes

“To be legible, evidence of misalignment probably has to be behavioral” by Ryan Greenblatt

Subtitle: Evidence from just model internals (e.g. interpretability) is unlikely to be broadly convincing.. One key hope for... more

15 Apr 2025 · 5 minutes

“Why do misalignment risks increase as AIs get more capable?” by Ryan Greenblatt

Subtitle: A breakdown of how higher capabilities increase risk. It's generally agreed that as AIs get more capable,... more

11 Apr 2025 · 7 minutes

“An overview of areas of control work” by Ryan Greenblatt

Subtitle: What are all the research and implementation areas helpful for control?. In this post, I'll list all... more

09 Apr 2025 · 54 minutes

“An overview of control measures” by Ryan Greenblatt

Subtitle: What methods can we use to ensure control?. We often talk about ensuring control, which in the... more

06 Apr 2025 · 50 minutes

“Buck on the 80,000 Hours podcast” by Buck Shlegeris

Buck on the 80,000 Hours podcast My podcast with Rob Wiblin from 80,000 Hours just came out. I’m really... more

05 Apr 2025 · 1 minute

“Notes on countermeasures for exploration hacking (aka sandbagging)” by Ryan Greenblatt

Subtitle: How can we prevent AIs from intentionally underperforming on our metrics?. If we naively apply RL to... more

04 Apr 2025 · 15 minutes

“Notes on handling non-concentrated failures with AI control: high level methods and different regimes” by Ryan Greenblatt

Subtitle: What are the methods and issues when failures occur diffusely over many actions?. In this post, I'll... more

03 Apr 2025 · 32 minutes

“Prioritizing threats for AI control” by Ryan Greenblatt

Subtitle: What are the main threats and how should we prioritize them?. We often talk about ensuring control,... more

19 Mar 2025 · 20 minutes

“How might we safely pass the buck to AI?” by Josh Clymer

Subtitle: Developing AI employees that are safer than human ones. My goal as an AI safety researcher is... more

19 Feb 2025 · 1 hour, 12 minutes

Terms Privacy Twitter Claim Podcast Clear Cache Help