Listen to resources from the AI Safety Fundamentals courses!https://aisafetyfundamentals.com/
By Tom Everitt, Ryan Carey, Lewis Hammond, James Fox, Eric Langlois, and Shane LeggAbout 2 years ago, we released the... more
Richard Ngo compiles a number of resources for thinking about careers in alignment research.Original text:https://docs.google.com/document/d/1iFszDulgpu1aZcq_aYFG7Nmcr5zgOhaeSwavOMk1akw/edit#heading=h.4whc9v22p7tbNarrated for AI Safety Fundamentals by... more
Transformative artificial intelligence (TAI) may be a key factor in the long-run trajectory of civilization. A growing interdisciplinary community has... more
MIRI is releasing a paper introducing a new model of deductively limited reasoning: “Logical induction,” authored by Scott Garrabrant, Tsvi... more
Suppose you want to build a robot to achieve some real-world goal for you—a goal that requires the robot to... more
Abstract:Neural network models have a reputation for being black boxes. We propose to monitor the features at every layer of... more
There is a growing sense that neural networks need to be interpretable to humans. The field of neural network interpretability... more
Abstract:What is learned by sophisticated neural network agents such as AlphaZero? This question is of both scientific and practical interest.... more
With the benefit of hindsight, we have a better sense of our takeaways from our first adversarial training project (paper).... more
(Update: We think the tone of this post was overly positive considering our somewhat weak results. You can read our... more