pod.link/1687830086
pod.link copied!
AI Safety Fundamentals
AI Safety Fundamentals
BlueDot Impact

Listen to resources from the AI Safety Fundamentals courses!https://aisafetyfundamentals.com/

Listen now on

Apple Podcasts
Spotify
Overcast
Podcast Addict
Pocket Casts
Castbox
Podbean
iHeartRadio
Player FM
Podcast Republic
Castro
RSS

Episodes

Progress on Causal Influence Diagrams

By Tom Everitt, Ryan Carey, Lewis Hammond, James Fox, Eric Langlois, and Shane LeggAbout 2 years ago, we released the... more

04 Jan 2025 · 23 minutes
Careers in Alignment

Richard Ngo compiles a number of resources for thinking about careers in alignment research.Original text:https://docs.google.com/document/d/1iFszDulgpu1aZcq_aYFG7Nmcr5zgOhaeSwavOMk1akw/edit#heading=h.4whc9v22p7tbNarrated for AI Safety Fundamentals by... more

04 Jan 2025 · 7 minutes
Cooperation, Conflict, and Transformative Artificial Intelligence: Sections 1 & 2 — Introduction, Strategy and Governance

Transformative artificial intelligence (TAI) may be a key factor in the long-run trajectory of civilization. A growing interdisciplinary community has... more

04 Jan 2025 · 27 minutes
Logical Induction (Blog Post)

MIRI is releasing a paper introducing a new model of deductively limited reasoning: “Logical induction,” authored by Scott Garrabrant, Tsvi... more

04 Jan 2025 · 11 minutes
Embedded Agents

Suppose you want to build a robot to achieve some real-world goal for you—a goal that requires the robot to... more

04 Jan 2025 · 17 minutes
Understanding Intermediate Layers Using Linear Classifier Probes

Abstract:Neural network models have a reputation for being black boxes. We propose to monitor the features at every layer of... more

04 Jan 2025 · 16 minutes
Feature Visualization

There is a growing sense that neural networks need to be interpretable to humans. The field of neural network interpretability... more

04 Jan 2025 · 31 minutes
Acquisition of Chess Knowledge in Alphazero

Abstract:What is learned by sophisticated neural network agents such as AlphaZero? This question is of both scientific and practical interest.... more

04 Jan 2025 · 22 minutes
Takeaways From Our Robust Injury Classifier Project [Redwood Research]

With the benefit of hindsight, we have a better sense of our takeaways from our first adversarial training project (paper).... more

04 Jan 2025 · 12 minutes
High-Stakes Alignment via Adversarial Training [Redwood Research Report]

(Update: We think the tone of this post was overly positive considering our somewhat weak results. You can read our... more

04 Jan 2025 · 19 minutes
AI Safety Fundamentals
Progress on Causal Influence Diagrams
AI Safety Fundamentals
Claim your free pod.link
Customize to match your brand
Claim a memorable URL
Add your own Google Analytics
Confirm Ownership

To claim this podcast, you must confirm your ownership via the email address located in your podcast’s RSS feed (). If you cannot access this email, please contact your hosting provider.