AI Safety Fundamentals Podcast Republic

AI Safety Fundamentals

By BlueDot Impact

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by BlueDot Impact

Category: Technology

Open in Apple Podcasts

Open RSS feed

Open Website

Rate for this podcast

Subscribers: 1
Reviews: 0
Episodes: 147

Description

Listen to resources from the AI Safety Fundamentals courses!https://aisafetyfundamentals.com/

Episode	Date
AI Watermarking Won’t Curb Disinformation Read the full episode description	Jan 04, 2025
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small Read the full episode description	Jan 04, 2025
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning Read the full episode description	Jan 04, 2025
Zoom In: An Introduction to Circuits Read the full episode description	Jan 04, 2025
Weak-To-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Read the full episode description	Jan 04, 2025
Can We Scale Human Feedback for Complex AI Tasks? Read the full episode description	Jan 04, 2025
Machine Learning for Humans: Supervised Learning Read the full episode description	Jan 04, 2025
On the Opportunities and Risks of Foundation Models Read the full episode description	Jan 04, 2025
Intelligence Explosion: Evidence and Import Read the full episode description	Jan 04, 2025
Visualizing the Deep Learning Revolution Read the full episode description	Jan 04, 2025
Future ML Systems Will Be Qualitatively Different Read the full episode description	Jan 04, 2025
More Is Different for AI Read the full episode description	Jan 04, 2025
A Short Introduction to Machine Learning Read the full episode description	Jan 04, 2025
Biological Anchors: A Trick That Might Or Might Not Work Read the full episode description	Jan 04, 2025
Four Background Claims Read the full episode description	Jan 04, 2025
AGI Safety From First Principles Read the full episode description	Jan 04, 2025
The Alignment Problem From a Deep Learning Perspective Read the full episode description	Jan 04, 2025
The Easy Goal Inference Problem Is Still Hard Read the full episode description	Jan 04, 2025
Superintelligence: Instrumental Convergence Read the full episode description	Jan 04, 2025
Specification Gaming: The Flip Side of AI Ingenuity Read the full episode description	Jan 04, 2025
Learning From Human Preferences Read the full episode description	Jan 04, 2025
What Failure Looks Like Read the full episode description	Jan 04, 2025
Deceptively Aligned Mesa-Optimizers: It’s Not Funny if I Have to Explain It Read the full episode description	Jan 04, 2025
Goal Misgeneralisation: Why Correct Specifications Aren’t Enough for Correct Goals Read the full episode description	Jan 04, 2025
Thought Experiments Provide a Third Anchor Read the full episode description	Jan 04, 2025
ML Systems Will Have Weird Failure Modes Read the full episode description	Jan 04, 2025
Where I Agree and Disagree with Eliezer Read the full episode description	Jan 04, 2025
AGI Ruin: A List of Lethalities Read the full episode description	Jan 04, 2025
Why AI Alignment Could Be Hard With Modern Deep Learning Read the full episode description	Jan 04, 2025
Yudkowsky Contra Christiano on AI Takeoff Speeds Read the full episode description	Jan 04, 2025
Is Power-Seeking AI an Existential Risk? Read the full episode description	Jan 04, 2025
Measuring Progress on Scalable Oversight for Large Language Models Read the full episode description	Jan 04, 2025
Supervising Strong Learners by Amplifying Weak Experts Read the full episode description	Jan 04, 2025
Summarizing Books With Human Feedback Read the full episode description	Jan 04, 2025
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models Read the full episode description	Jan 04, 2025
AI Safety via Debate Read the full episode description	Jan 04, 2025
AI Safety via Red Teaming Language Models With Language Models Read the full episode description	Jan 04, 2025
Robust Feature-Level Adversaries Are Interpretability Tools Read the full episode description	Jan 04, 2025
Debate Update: Obfuscated Arguments Problem Read the full episode description	Jan 04, 2025
Introduction to Logical Decision Theory for Computer Scientists Read the full episode description	Jan 04, 2025
High-Stakes Alignment via Adversarial Training [Redwood Research Report] Read the full episode description	Jan 04, 2025
Takeaways From Our Robust Injury Classifier Project [Redwood Research] Read the full episode description	Jan 04, 2025
Acquisition of Chess Knowledge in Alphazero Read the full episode description	Jan 04, 2025
Feature Visualization Read the full episode description	Jan 04, 2025
Understanding Intermediate Layers Using Linear Classifier Probes Read the full episode description	Jan 04, 2025
Embedded Agents Read the full episode description	Jan 04, 2025
Logical Induction (Blog Post) Read the full episode description	Jan 04, 2025
Cooperation, Conflict, and Transformative Artificial Intelligence: Sections 1 & 2 — Introduction, Strategy and Governance Read the full episode description	Jan 04, 2025
Careers in Alignment Read the full episode description	Jan 04, 2025
Progress on Causal Influence Diagrams Read the full episode description	Jan 04, 2025
We Need a Science of Evals Read the full episode description	Jan 04, 2025
Introduction to Mechanistic Interpretability Read the full episode description	Jan 04, 2025
Constitutional AI Harmlessness from AI Feedback Read the full episode description	Jan 04, 2025
Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback Read the full episode description	Jan 04, 2025
Illustrating Reinforcement Learning from Human Feedback (RLHF) Read the full episode description	Jan 04, 2025
Eliciting Latent Knowledge Read the full episode description	Jan 04, 2025
Deep Double Descent Read the full episode description	Jan 04, 2025
Chinchilla’s Wild Implications Read the full episode description	Jan 04, 2025
Intro to Brain-Like-AGI Safety Read the full episode description	Jan 04, 2025
Gradient Hacking: Definitions and Examples Read the full episode description	Jan 04, 2025
An Investigation of Model-Free Planning Read the full episode description	Jan 04, 2025
Discovering Latent Knowledge in Language Models Without Supervision Read the full episode description	Jan 04, 2025
Toy Models of Superposition Read the full episode description	Jan 04, 2025
Imitative Generalisation (AKA ‘Learning the Prior’) Read the full episode description	Jan 04, 2025
ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation Read the full episode description	Jan 04, 2025
Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions Read the full episode description	Jan 04, 2025
Low-Stakes Alignment Read the full episode description	Jan 04, 2025
Empirical Findings Generalize Surprisingly Far Read the full episode description	Jan 04, 2025
Compute Trends Across Three Eras of Machine Learning Read the full episode description	Jan 04, 2025
Worst-Case Thinking in AI Alignment Read the full episode description	Jan 04, 2025
How to Get Feedback Read the full episode description	Jan 04, 2025
Public by Default: How We Manage Information Visibility at Get on Board Read the full episode description	Jan 04, 2025
Writing, Briefly Read the full episode description	Jan 04, 2025
Being the (Pareto) Best in the World Read the full episode description	Jan 04, 2025
How to Succeed as an Early-Stage Researcher: The “Lean Startup” Approach Read the full episode description	Jan 04, 2025
Become a Person who Actually Does Things Read the full episode description	Jan 04, 2025
Planning a High-Impact Career: A Summary of Everything You Need to Know in 7 Points Read the full episode description	Jan 04, 2025
Working in AI Alignment Read the full episode description	Jan 04, 2025
Computing Power and the Governance of AI Read the full episode description	Jan 04, 2025
AI Control: Improving Safety Despite Intentional Subversion Read the full episode description	Jan 04, 2025
Challenges in Evaluating AI Systems Read the full episode description	Jan 04, 2025
Emerging Processes for Frontier AI Safety Read the full episode description	Jan 04, 2025
If-Then Commitments for AI Risk Reduction Read the full episode description	Jan 02, 2025
This is How AI Will Transform How Science Gets Done Read the full episode description	Jan 02, 2025
Open-Sourcing Highly Capable Foundation Models: An Evaluation of Risks, Benefits, and Alternative Methods for Pursuing Open-Source Objectives Read the full episode description	Dec 30, 2024
So You Want to be a Policy Entrepreneur? Read the full episode description	Dec 30, 2024
Considerations for Governing Open Foundation Models Read the full episode description	Dec 30, 2024
Driving U.S. Innovation in Artificial Intelligence: A Roadmap for Artificial Intelligence Policy in the United States Senate Read the full episode description	May 22, 2024
Societal Adaptation to Advanced AI Read the full episode description	May 20, 2024
The AI Triad and What It Means for National Security Strategy Read the full episode description	May 20, 2024
OECD AI Principles Read the full episode description	May 13, 2024
A pro-innovation approach to AI regulation: government response Read the full episode description	May 13, 2024
The Bletchley Declaration by Countries Attending the AI Safety Summit, 1-2 November 2023 Read the full episode description	May 13, 2024
Key facts: UNESCO’s Recommendation on the Ethics of Artificial Intelligence Read the full episode description	May 13, 2024
AI Index Report 2024, Chapter 7: Policy and Governance Read the full episode description	May 13, 2024
Recent U.S. Efforts on AI Policy Read the full episode description	May 13, 2024
FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence Read the full episode description	May 13, 2024
High-level summary of the AI Act Read the full episode description	May 13, 2024
China’s AI Regulations and How They Get Made Read the full episode description	May 13, 2024
The Policy Playbook: Building a Systems-Oriented Approach to Technology and National Security Policy Read the full episode description	May 05, 2024
Strengthening Resilience to AI Risk: A Guide for UK Policymakers Read the full episode description	May 04, 2024
The Convergence of Artificial Intelligence and the Life Sciences: Safeguarding Technology, Rethinking Governance, and Preventing Catastrophe Read the full episode description	May 03, 2024
Rogue AIs Read the full episode description	May 01, 2024
What is AI Alignment? Read the full episode description	May 01, 2024
An Overview of Catastrophic AI Risks Read the full episode description	Apr 29, 2024
Future Risks of Frontier AI Read the full episode description	Apr 23, 2024
What risks does AI pose? Read the full episode description	Apr 23, 2024
AI Could Defeat All Of Us Combined Read the full episode description	Apr 22, 2024
Moore's Law for Everything Read the full episode description	Apr 16, 2024
The Transformative Potential of Artificial Intelligence Read the full episode description	Apr 16, 2024
Positive AI Economic Futures Read the full episode description	Apr 16, 2024
The Economic Potential of Generative AI: The Next Productivity Frontier Read the full episode description	Apr 16, 2024
Visualizing the Deep Learning Revolution Read the full episode description	May 13, 2023
The AI Triad and What It Means for National Security Strategy Read the full episode description	May 13, 2023
A Short Introduction to Machine Learning Read the full episode description	May 13, 2023
As AI Agents Like Auto-GPT Speed up Generative AI Race, We All Need to Buckle Up Read the full episode description	May 13, 2023
Overview of How AI Might Exacerbate Long-Running Catastrophic Risks Read the full episode description	May 13, 2023
The Need for Work on Technical AI Alignment Read the full episode description	May 13, 2023
Specification Gaming: The Flip Side of AI Ingenuity Read the full episode description	May 13, 2023
Why Might Misaligned, Advanced AI Cause Catastrophe? Read the full episode description	May 13, 2023
Emergent Deception and Emergent Optimization Read the full episode description	May 13, 2023
AI Safety Seems Hard to Measure Read the full episode description	May 13, 2023
Nobody’s on the Ball on AGI Alignment Read the full episode description	May 13, 2023
Avoiding Extreme Global Vulnerability as a Core AI Governance Problem Read the full episode description	May 13, 2023
Primer on Safety Standards and Regulations for Industrial-Scale AI Development Read the full episode description	May 13, 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety Read the full episode description	May 13, 2023
Model Evaluation for Extreme Risks Read the full episode description	May 13, 2023
The State of AI in Different Countries — An Overview Read the full episode description	May 13, 2023
Primer on AI Chips and AI Governance Read the full episode description	May 13, 2023
Choking off China’s Access to the Future of AI Read the full episode description	May 13, 2023
Racing Through a Minefield: The AI Deployment Problem Read the full episode description	May 13, 2023
What Does It Take to Catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring Read the full episode description	May 13, 2023
A Tour of Emerging Cryptographic Technologies Read the full episode description	May 13, 2023
Historical Case Studies of Technology Governance and International Agreements Read the full episode description	May 13, 2023
International Institutions for Advanced AI Read the full episode description	May 13, 2023
OpenAI Charter Read the full episode description	May 13, 2023
LP Announcement by OpenAI Read the full episode description	May 13, 2023
What AI Companies Can Do Today to Help With the Most Important Century Read the full episode description	May 13, 2023
12 Tentative Ideas for Us AI Policy Read the full episode description	May 13, 2023
Let’s Think About Slowing Down AI Read the full episode description	May 13, 2023
Some Talent Needs in AI Governance Read the full episode description	May 13, 2023
AI Governance Needs Technical Work Read the full episode description	May 13, 2023
Career Resources on AI Strategy Research Read the full episode description	May 13, 2023
China-Related AI Safety and Governance Paths Read the full episode description	May 13, 2023
List of EA Funding Opportunities Read the full episode description	May 13, 2023
My Current Impressions on Career Choice for Longtermists Read the full episode description	May 13, 2023
AI Governance Needs Technical Work Read the full episode description	May 13, 2023