Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
Episode | Date |
---|---|
AI Watermarking Won’t Curb Disinformation
|
Jan 04, 2025 |
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small
|
Jan 04, 2025 |
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
|
Jan 04, 2025 |
Zoom In: An Introduction to Circuits
|
Jan 04, 2025 |
Weak-To-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
|
Jan 04, 2025 |
Can We Scale Human Feedback for Complex AI Tasks?
|
Jan 04, 2025 |
Machine Learning for Humans: Supervised Learning
|
Jan 04, 2025 |
On the Opportunities and Risks of Foundation Models
|
Jan 04, 2025 |
Intelligence Explosion: Evidence and Import
|
Jan 04, 2025 |
Visualizing the Deep Learning Revolution
|
Jan 04, 2025 |
Future ML Systems Will Be Qualitatively Different
|
Jan 04, 2025 |
More Is Different for AI
|
Jan 04, 2025 |
A Short Introduction to Machine Learning
|
Jan 04, 2025 |
Biological Anchors: A Trick That Might Or Might Not Work
|
Jan 04, 2025 |
Four Background Claims
|
Jan 04, 2025 |
AGI Safety From First Principles
|
Jan 04, 2025 |
The Alignment Problem From a Deep Learning Perspective
|
Jan 04, 2025 |
The Easy Goal Inference Problem Is Still Hard
|
Jan 04, 2025 |
Superintelligence: Instrumental Convergence
|
Jan 04, 2025 |
Specification Gaming: The Flip Side of AI Ingenuity
|
Jan 04, 2025 |
Learning From Human Preferences
|
Jan 04, 2025 |
What Failure Looks Like
|
Jan 04, 2025 |
Deceptively Aligned Mesa-Optimizers: It’s Not Funny if I Have to Explain It
|
Jan 04, 2025 |
Goal Misgeneralisation: Why Correct Specifications Aren’t Enough for Correct Goals
|
Jan 04, 2025 |
Thought Experiments Provide a Third Anchor
|
Jan 04, 2025 |
ML Systems Will Have Weird Failure Modes
|
Jan 04, 2025 |
Where I Agree and Disagree with Eliezer
|
Jan 04, 2025 |
AGI Ruin: A List of Lethalities
|
Jan 04, 2025 |
Why AI Alignment Could Be Hard With Modern Deep Learning
|
Jan 04, 2025 |
Yudkowsky Contra Christiano on AI Takeoff Speeds
|
Jan 04, 2025 |
Is Power-Seeking AI an Existential Risk?
|
Jan 04, 2025 |
Measuring Progress on Scalable Oversight for Large Language Models
|
Jan 04, 2025 |
Supervising Strong Learners by Amplifying Weak Experts
|
Jan 04, 2025 |
Summarizing Books With Human Feedback
|
Jan 04, 2025 |
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models
|
Jan 04, 2025 |
AI Safety via Debate
|
Jan 04, 2025 |
AI Safety via Red Teaming Language Models With Language Models
|
Jan 04, 2025 |
Robust Feature-Level Adversaries Are Interpretability Tools
|
Jan 04, 2025 |
Debate Update: Obfuscated Arguments Problem
|
Jan 04, 2025 |
Introduction to Logical Decision Theory for Computer Scientists
|
Jan 04, 2025 |
High-Stakes Alignment via Adversarial Training [Redwood Research Report]
|
Jan 04, 2025 |
Takeaways From Our Robust Injury Classifier Project [Redwood Research]
|
Jan 04, 2025 |
Acquisition of Chess Knowledge in Alphazero
|
Jan 04, 2025 |
Feature Visualization
|
Jan 04, 2025 |
Understanding Intermediate Layers Using Linear Classifier Probes
|
Jan 04, 2025 |
Embedded Agents
|
Jan 04, 2025 |
Logical Induction (Blog Post)
|
Jan 04, 2025 |
Cooperation, Conflict, and Transformative Artificial Intelligence: Sections 1 & 2 — Introduction, Strategy and Governance
|
Jan 04, 2025 |
Careers in Alignment
|
Jan 04, 2025 |
Progress on Causal Influence Diagrams
|
Jan 04, 2025 |
We Need a Science of Evals
|
Jan 04, 2025 |
Introduction to Mechanistic Interpretability
|
Jan 04, 2025 |
Constitutional AI Harmlessness from AI Feedback
|
Jan 04, 2025 |
Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
|
Jan 04, 2025 |
Illustrating Reinforcement Learning from Human Feedback (RLHF)
|
Jan 04, 2025 |
Eliciting Latent Knowledge
|
Jan 04, 2025 |
Deep Double Descent
|
Jan 04, 2025 |
Chinchilla’s Wild Implications
|
Jan 04, 2025 |
Intro to Brain-Like-AGI Safety
|
Jan 04, 2025 |
Gradient Hacking: Definitions and Examples
|
Jan 04, 2025 |
An Investigation of Model-Free Planning
|
Jan 04, 2025 |
Discovering Latent Knowledge in Language Models Without Supervision
|
Jan 04, 2025 |
Toy Models of Superposition
|
Jan 04, 2025 |
Imitative Generalisation (AKA ‘Learning the Prior’)
|
Jan 04, 2025 |
ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation
|
Jan 04, 2025 |
Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions
|
Jan 04, 2025 |
Low-Stakes Alignment
|
Jan 04, 2025 |
Empirical Findings Generalize Surprisingly Far
|
Jan 04, 2025 |
Compute Trends Across Three Eras of Machine Learning
|
Jan 04, 2025 |
Worst-Case Thinking in AI Alignment
|
Jan 04, 2025 |
How to Get Feedback
|
Jan 04, 2025 |
Public by Default: How We Manage Information Visibility at Get on Board
|
Jan 04, 2025 |
Writing, Briefly
|
Jan 04, 2025 |
Being the (Pareto) Best in the World
|
Jan 04, 2025 |
How to Succeed as an Early-Stage Researcher: The “Lean Startup” Approach
|
Jan 04, 2025 |
Become a Person who Actually Does Things
|
Jan 04, 2025 |
Planning a High-Impact Career: A Summary of Everything You Need to Know in 7 Points
|
Jan 04, 2025 |
Working in AI Alignment
|
Jan 04, 2025 |
Computing Power and the Governance of AI
|
Jan 04, 2025 |
AI Control: Improving Safety Despite Intentional Subversion
|
Jan 04, 2025 |
Challenges in Evaluating AI Systems
|
Jan 04, 2025 |
Emerging Processes for Frontier AI Safety
|
Jan 04, 2025 |
If-Then Commitments for AI Risk Reduction
|
Jan 02, 2025 |
This is How AI Will Transform How Science Gets Done
|
Jan 02, 2025 |
Open-Sourcing Highly Capable Foundation Models: An Evaluation of Risks, Benefits, and Alternative Methods for Pursuing Open-Source Objectives
|
Dec 30, 2024 |
So You Want to be a Policy Entrepreneur?
|
Dec 30, 2024 |
Considerations for Governing Open Foundation Models
|
Dec 30, 2024 |
Driving U.S. Innovation in Artificial Intelligence: A Roadmap for Artificial Intelligence Policy in the United States Senate
|
May 22, 2024 |
Societal Adaptation to Advanced AI
|
May 20, 2024 |
The AI Triad and What It Means for National Security Strategy
|
May 20, 2024 |
OECD AI Principles
|
May 13, 2024 |
A pro-innovation approach to AI regulation: government response
|
May 13, 2024 |
The Bletchley Declaration by Countries Attending the AI Safety Summit, 1-2 November 2023
|
May 13, 2024 |
Key facts: UNESCO’s Recommendation on the Ethics of Artificial Intelligence
|
May 13, 2024 |
AI Index Report 2024, Chapter 7: Policy and Governance
|
May 13, 2024 |
Recent U.S. Efforts on AI Policy
|
May 13, 2024 |
FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence
|
May 13, 2024 |
High-level summary of the AI Act
|
May 13, 2024 |
China’s AI Regulations and How They Get Made
|
May 13, 2024 |
The Policy Playbook: Building a Systems-Oriented Approach to Technology and National Security Policy
|
May 05, 2024 |
Strengthening Resilience to AI Risk: A Guide for UK Policymakers
|
May 04, 2024 |
The Convergence of Artificial Intelligence and the Life Sciences: Safeguarding Technology, Rethinking Governance, and Preventing Catastrophe
|
May 03, 2024 |
Rogue AIs
|
May 01, 2024 |
What is AI Alignment?
|
May 01, 2024 |
An Overview of Catastrophic AI Risks
|
Apr 29, 2024 |
Future Risks of Frontier AI
|
Apr 23, 2024 |
What risks does AI pose?
|
Apr 23, 2024 |
AI Could Defeat All Of Us Combined
|
Apr 22, 2024 |
Moore's Law for Everything
|
Apr 16, 2024 |
The Transformative Potential of Artificial Intelligence
|
Apr 16, 2024 |
Positive AI Economic Futures
|
Apr 16, 2024 |
The Economic Potential of Generative AI: The Next Productivity Frontier
|
Apr 16, 2024 |
Visualizing the Deep Learning Revolution
|
May 13, 2023 |
The AI Triad and What It Means for National Security Strategy
|
May 13, 2023 |
A Short Introduction to Machine Learning
|
May 13, 2023 |
As AI Agents Like Auto-GPT Speed up Generative AI Race, We All Need to Buckle Up
|
May 13, 2023 |
Overview of How AI Might Exacerbate Long-Running Catastrophic Risks
|
May 13, 2023 |
The Need for Work on Technical AI Alignment
|
May 13, 2023 |
Specification Gaming: The Flip Side of AI Ingenuity
|
May 13, 2023 |
Why Might Misaligned, Advanced AI Cause Catastrophe?
|
May 13, 2023 |
Emergent Deception and Emergent Optimization
|
May 13, 2023 |
AI Safety Seems Hard to Measure
|
May 13, 2023 |
Nobody’s on the Ball on AGI Alignment
|
May 13, 2023 |
Avoiding Extreme Global Vulnerability as a Core AI Governance Problem
|
May 13, 2023 |
Primer on Safety Standards and Regulations for Industrial-Scale AI Development
|
May 13, 2023 |
Frontier AI Regulation: Managing Emerging Risks to Public Safety
|
May 13, 2023 |
Model Evaluation for Extreme Risks
|
May 13, 2023 |
The State of AI in Different Countries — An Overview
|
May 13, 2023 |
Primer on AI Chips and AI Governance
|
May 13, 2023 |
Choking off China’s Access to the Future of AI
|
May 13, 2023 |
Racing Through a Minefield: The AI Deployment Problem
|
May 13, 2023 |
What Does It Take to Catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring
|
May 13, 2023 |
A Tour of Emerging Cryptographic Technologies
|
May 13, 2023 |
Historical Case Studies of Technology Governance and International Agreements
|
May 13, 2023 |
International Institutions for Advanced AI
|
May 13, 2023 |
OpenAI Charter
|
May 13, 2023 |
LP Announcement by OpenAI
|
May 13, 2023 |
What AI Companies Can Do Today to Help With the Most Important Century
|
May 13, 2023 |
12 Tentative Ideas for Us AI Policy
|
May 13, 2023 |
Let’s Think About Slowing Down AI
|
May 13, 2023 |
Some Talent Needs in AI Governance
|
May 13, 2023 |
AI Governance Needs Technical Work
|
May 13, 2023 |
Career Resources on AI Strategy Research
|
May 13, 2023 |
China-Related AI Safety and Governance Paths
|
May 13, 2023 |
List of EA Funding Opportunities
|
May 13, 2023 |
My Current Impressions on Career Choice for Longtermists
|
May 13, 2023 |
AI Governance Needs Technical Work
|
May 13, 2023 |