pod.link/1806316198
pod.link copied!
Redwood Research Blog
Redwood Research Blog
Redwood Research

Narrations of Redwood Research blog posts. Redwood Research is a research nonprofit based in Berkeley. We investigate risks posed by the development of... more

Listen now on

Apple Podcasts
Spotify
Overcast
Podcast Addict
Pocket Casts
Castbox
Podbean
iHeartRadio
Player FM
Podcast Republic
Castro
RSS

Episodes

“When does training a model change its goals?” by Vivek Hebbar, Ryan Greenblatt

Subtitle: Can a scheming AI's goals really stay unchanged through training?. Here are two opposing pictures of how... more

12 Jun 2025 · 29 minutes
“The case for countermeasures to memetic spread of misaligned values” by Alex Mallen

Subtitle: Defending against alignment problems that might come with long-term memory. As various people have written about before,... more

28 May 2025 · 13 minutes
“AIs at the current capability level may be important for future safety work” by Ryan Greenblatt

Subtitle: Some reasons why relatively weak AIs might still be important when we have very powerful AIs. Sometimes... more

12 May 2025 · 7 minutes
“Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking” by Julian Stastny, Buck Shlegeris

Subtitle: A new analysis of the risk of AIs intentionally performing poorly.. In the future, we will want... more

08 May 2025 · 30 minutes
“Training-time schemers vs behavioral schemers” by Alex Mallen

Subtitle: Clarifying ways in which faking alignment during training is neither necessary nor sufficient for the kind of scheming... more

06 May 2025 · 13 minutes
“What’s going on with AI progress and trends? (As of 5/2025)” by Ryan Greenblatt

Subtitle: My views on what's driving AI progress and where it's headed.. AI progress is driven by improved... more

03 May 2025 · 16 minutes
“How can we solve diffuse threats like research sabotage with AI control?” by Vivek Hebbar

Subtitle: Preventing research sabotage will require techniques very different from the original control paper.. Misaligned AIs might engage... more

30 Apr 2025 · 15 minutes
“7+ tractable directions in AI control” by Ryan Greenblatt

Subtitle: A list of easy-to-start directions in AI control targeted at independent researchers without as much context or compute.... more

29 Apr 2025 · 26 minutes
“Clarifying AI R&D threat models” by Josh Clymer

Subtitle: (There are a few). A casual reader of one of the many AI company safety frameworks might... more

25 Apr 2025 · 9 minutes
“How training-gamers might function (and win)” by Vivek Hebbar

Subtitle: A model of the relationship between higher level goals, explicit reasoning, and learned heuristics in capable agents.. ... more

24 Apr 2025 · 32 minutes
Redwood Research Blog
“When does training a model change its goals?” by Vivek Hebbar, Ryan Greenblatt
Redwood Research Blog
Claim your free pod.link
Customize to match your brand
Claim a memorable URL
Add your own Google Analytics
Confirm Ownership

To claim this podcast, you must confirm your ownership via the email address located in your podcast’s RSS feed (info@@rdwrs.com). If you cannot access this email, please contact your hosting provider.