Listen to resources from the AI Safety Fundamentals courses!https://aisafetyfundamentals.com/
By Tom Everitt, Ryan Carey, Lewis Hammond, James Fox, Eric Langlois, and Shane LeggAbout 2 years ago, we released the... more
Richard Ngo compiles a number of resources for thinking about careers in alignment research.Original text:https://docs.google.com/document/d/1iFszDulgpu1aZcq_aYFG7Nmcr5zgOhaeSwavOMk1akw/edit#heading=h.4whc9v22p7tbNarrated for AI Safety Fundamentals by... more
Transformative artificial intelligence (TAI) may be a key factor in the long-run trajectory of civilization. A growing interdisciplinary community has... more
MIRI is releasing a paper introducing a new model of deductively limited reasoning: “Logical induction,” authored by Scott Garrabrant, Tsvi... more
Suppose you want to build a robot to achieve some real-world goal for you—a goal that requires the robot to... more
Abstract:Neural network models have a reputation for being black boxes. We propose to monitor the features at every layer of... more
There is a growing sense that neural networks need to be interpretable to humans. The field of neural network interpretability... more
Abstract:What is learned by sophisticated neural network agents such as AlphaZero? This question is of both scientific and practical interest.... more
With the benefit of hindsight, we have a better sense of our takeaways from our first adversarial training project (paper).... more
(Update: We think the tone of this post was overly positive considering our somewhat weak results. You can read our... more
To claim this podcast, you must confirm your ownership via the email address located in your podcast’s RSS feed (). If you cannot access this email, please contact your hosting provider.