AI Safety Fundamentals

By BlueDot Impact

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by BlueDot Impact

Category: Technology

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast
    

Subscribers: 1
Reviews: 0
Episodes: 147

Description

Listen to resources from the AI Safety Fundamentals courses!https://aisafetyfundamentals.com/

Episode Date
AI Watermarking Won’t Curb Disinformation
Jan 04, 2025
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small
Jan 04, 2025
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Jan 04, 2025
Zoom In: An Introduction to Circuits
Jan 04, 2025
Weak-To-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Jan 04, 2025
Can We Scale Human Feedback for Complex AI Tasks?
Jan 04, 2025
Machine Learning for Humans: Supervised Learning
Jan 04, 2025
On the Opportunities and Risks of Foundation Models
Jan 04, 2025
Intelligence Explosion: Evidence and Import
Jan 04, 2025
Visualizing the Deep Learning Revolution
Jan 04, 2025
Future ML Systems Will Be Qualitatively Different
Jan 04, 2025
More Is Different for AI
Jan 04, 2025
A Short Introduction to Machine Learning
Jan 04, 2025
Biological Anchors: A Trick That Might Or Might Not Work
Jan 04, 2025
Four Background Claims
Jan 04, 2025
AGI Safety From First Principles
Jan 04, 2025
The Alignment Problem From a Deep Learning Perspective
Jan 04, 2025
The Easy Goal Inference Problem Is Still Hard
Jan 04, 2025
Superintelligence: Instrumental Convergence
Jan 04, 2025
Specification Gaming: The Flip Side of AI Ingenuity
Jan 04, 2025
Learning From Human Preferences
Jan 04, 2025
What Failure Looks Like
Jan 04, 2025
Deceptively Aligned Mesa-Optimizers: It’s Not Funny if I Have to Explain It
Jan 04, 2025
Goal Misgeneralisation: Why Correct Specifications Aren’t Enough for Correct Goals
Jan 04, 2025
Thought Experiments Provide a Third Anchor
Jan 04, 2025
ML Systems Will Have Weird Failure Modes
Jan 04, 2025
Where I Agree and Disagree with Eliezer
Jan 04, 2025
AGI Ruin: A List of Lethalities
Jan 04, 2025
Why AI Alignment Could Be Hard With Modern Deep Learning
Jan 04, 2025
Yudkowsky Contra Christiano on AI Takeoff Speeds
Jan 04, 2025
Is Power-Seeking AI an Existential Risk?
Jan 04, 2025
Measuring Progress on Scalable Oversight for Large Language Models
Jan 04, 2025
Supervising Strong Learners by Amplifying Weak Experts
Jan 04, 2025
Summarizing Books With Human Feedback
Jan 04, 2025
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models
Jan 04, 2025
AI Safety via Debate
Jan 04, 2025
AI Safety via Red Teaming Language Models With Language Models
Jan 04, 2025
Robust Feature-Level Adversaries Are Interpretability Tools
Jan 04, 2025
Debate Update: Obfuscated Arguments Problem
Jan 04, 2025
Introduction to Logical Decision Theory for Computer Scientists
Jan 04, 2025
High-Stakes Alignment via Adversarial Training [Redwood Research Report]
Jan 04, 2025
Takeaways From Our Robust Injury Classifier Project [Redwood Research]
Jan 04, 2025
Acquisition of Chess Knowledge in Alphazero
Jan 04, 2025
Feature Visualization
Jan 04, 2025
Understanding Intermediate Layers Using Linear Classifier Probes
Jan 04, 2025
Embedded Agents
Jan 04, 2025
Logical Induction (Blog Post)
Jan 04, 2025
Cooperation, Conflict, and Transformative Artificial Intelligence: Sections 1 & 2 — Introduction, Strategy and Governance
Jan 04, 2025
Careers in Alignment
Jan 04, 2025
Progress on Causal Influence Diagrams
Jan 04, 2025
We Need a Science of Evals
Jan 04, 2025
Introduction to Mechanistic Interpretability
Jan 04, 2025
Constitutional AI Harmlessness from AI Feedback
Jan 04, 2025
Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Jan 04, 2025
Illustrating Reinforcement Learning from Human Feedback (RLHF)
Jan 04, 2025
Eliciting Latent Knowledge
Jan 04, 2025
Deep Double Descent
Jan 04, 2025
Chinchilla’s Wild Implications
Jan 04, 2025
Intro to Brain-Like-AGI Safety
Jan 04, 2025
Gradient Hacking: Definitions and Examples
Jan 04, 2025
An Investigation of Model-Free Planning
Jan 04, 2025
Discovering Latent Knowledge in Language Models Without Supervision
Jan 04, 2025
Toy Models of Superposition
Jan 04, 2025
Imitative Generalisation (AKA ‘Learning the Prior’)
Jan 04, 2025
ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation
Jan 04, 2025
Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions
Jan 04, 2025
Low-Stakes Alignment
Jan 04, 2025
Empirical Findings Generalize Surprisingly Far
Jan 04, 2025
Compute Trends Across Three Eras of Machine Learning
Jan 04, 2025
Worst-Case Thinking in AI Alignment
Jan 04, 2025
How to Get Feedback
Jan 04, 2025
Public by Default: How We Manage Information Visibility at Get on Board
Jan 04, 2025
Writing, Briefly
Jan 04, 2025
Being the (Pareto) Best in the World
Jan 04, 2025
How to Succeed as an Early-Stage Researcher: The “Lean Startup” Approach
Jan 04, 2025
Become a Person who Actually Does Things
Jan 04, 2025
Planning a High-Impact Career: A Summary of Everything You Need to Know in 7 Points
Jan 04, 2025
Working in AI Alignment
Jan 04, 2025
Computing Power and the Governance of AI
Jan 04, 2025
AI Control: Improving Safety Despite Intentional Subversion
Jan 04, 2025
Challenges in Evaluating AI Systems
Jan 04, 2025
Emerging Processes for Frontier AI Safety
Jan 04, 2025
If-Then Commitments for AI Risk Reduction
Jan 02, 2025
This is How AI Will Transform How Science Gets Done
Jan 02, 2025
Open-Sourcing Highly Capable Foundation Models: An Evaluation of Risks, Benefits, and Alternative Methods for Pursuing Open-Source Objectives
Dec 30, 2024
So You Want to be a Policy Entrepreneur?
Dec 30, 2024
Considerations for Governing Open Foundation Models
Dec 30, 2024
Driving U.S. Innovation in Artificial Intelligence: A Roadmap for Artificial Intelligence Policy in the United States Senate
May 22, 2024
Societal Adaptation to Advanced AI
May 20, 2024
The AI Triad and What It Means for National Security Strategy
May 20, 2024
OECD AI Principles
May 13, 2024
A pro-innovation approach to AI regulation: government response
May 13, 2024
The Bletchley Declaration by Countries Attending the AI Safety Summit, 1-2 November 2023
May 13, 2024
Key facts: UNESCO’s Recommendation on the Ethics of Artificial Intelligence
May 13, 2024
AI Index Report 2024, Chapter 7: Policy and Governance
May 13, 2024
Recent U.S. Efforts on AI Policy
May 13, 2024
FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence
May 13, 2024
High-level summary of the AI Act
May 13, 2024
China’s AI Regulations and How They Get Made
May 13, 2024
The Policy Playbook: Building a Systems-Oriented Approach to Technology and National Security Policy
May 05, 2024
Strengthening Resilience to AI Risk: A Guide for UK Policymakers
May 04, 2024
The Convergence of Artificial Intelligence and the Life Sciences: Safeguarding Technology, Rethinking Governance, and Preventing Catastrophe
May 03, 2024
Rogue AIs
May 01, 2024
What is AI Alignment?
May 01, 2024
An Overview of Catastrophic AI Risks
Apr 29, 2024
Future Risks of Frontier AI
Apr 23, 2024
What risks does AI pose?
Apr 23, 2024
AI Could Defeat All Of Us Combined
Apr 22, 2024
Moore's Law for Everything
Apr 16, 2024
The Transformative Potential of Artificial Intelligence
Apr 16, 2024
Positive AI Economic Futures
Apr 16, 2024
The Economic Potential of Generative AI: The Next Productivity Frontier
Apr 16, 2024
Visualizing the Deep Learning Revolution
May 13, 2023
The AI Triad and What It Means for National Security Strategy
May 13, 2023
A Short Introduction to Machine Learning
May 13, 2023
As AI Agents Like Auto-GPT Speed up Generative AI Race, We All Need to Buckle Up
May 13, 2023
Overview of How AI Might Exacerbate Long-Running Catastrophic Risks
May 13, 2023
The Need for Work on Technical AI Alignment
May 13, 2023
Specification Gaming: The Flip Side of AI Ingenuity
May 13, 2023
Why Might Misaligned, Advanced AI Cause Catastrophe?
May 13, 2023
Emergent Deception and Emergent Optimization
May 13, 2023
AI Safety Seems Hard to Measure
May 13, 2023
Nobody’s on the Ball on AGI Alignment
May 13, 2023
Avoiding Extreme Global Vulnerability as a Core AI Governance Problem
May 13, 2023
Primer on Safety Standards and Regulations for Industrial-Scale AI Development
May 13, 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
May 13, 2023
Model Evaluation for Extreme Risks
May 13, 2023
The State of AI in Different Countries — An Overview
May 13, 2023
Primer on AI Chips and AI Governance
May 13, 2023
Choking off China’s Access to the Future of AI
May 13, 2023
Racing Through a Minefield: The AI Deployment Problem
May 13, 2023
What Does It Take to Catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring
May 13, 2023
A Tour of Emerging Cryptographic Technologies
May 13, 2023
Historical Case Studies of Technology Governance and International Agreements
May 13, 2023
International Institutions for Advanced AI
May 13, 2023
OpenAI Charter
May 13, 2023
LP Announcement by OpenAI
May 13, 2023
What AI Companies Can Do Today to Help With the Most Important Century
May 13, 2023
12 Tentative Ideas for Us AI Policy
May 13, 2023
Let’s Think About Slowing Down AI
May 13, 2023
Some Talent Needs in AI Governance
May 13, 2023
AI Governance Needs Technical Work
May 13, 2023
Career Resources on AI Strategy Research
May 13, 2023
China-Related AI Safety and Governance Paths
May 13, 2023
List of EA Funding Opportunities
May 13, 2023
My Current Impressions on Career Choice for Longtermists
May 13, 2023
AI Governance Needs Technical Work
May 13, 2023