Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
Episode | Date |
---|---|
“Auditing language models for hidden objectives” by Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Akbir Khan, Euan Ong, Christopher Olah, Fabien Roger, Meg, Drake Thomas, Adam Jermyn, Monte M, evhub
|
Mar 16, 2025 |
“The Most Forbidden Technique” by Zvi
|
Mar 14, 2025 |
“Trojan Sky” by Richard_Ngo
|
Mar 13, 2025 |
“OpenAI:” by Daniel Kokotajlo
|
Mar 11, 2025 |
“How Much Are LLMs Actually Boosting Real-World Programmer Productivity?” by Thane Ruthenis
|
Mar 09, 2025 |
“So how well is Claude playing Pokémon?” by Julian Bradshaw
|
Mar 09, 2025 |
“Methods for strong human germline engineering” by TsviBT
|
Mar 07, 2025 |
“Have LLMs Generated Novel Insights?” by abramdemski, Cole Wyeth
|
Mar 06, 2025 |
“A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis
|
Mar 06, 2025 |
“Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard
|
Mar 05, 2025 |
“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout
|
Mar 04, 2025 |
“Judgements: Merging Prediction & Evidence” by abramdemski
|
Mar 01, 2025 |
“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis
|
Feb 26, 2025 |
“Power Lies Trembling: a three-book review” by Richard_Ngo
|
Feb 26, 2025 |
“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans
|
Feb 26, 2025 |
“The Paris AI Anti-Safety Summit” by Zvi
|
Feb 22, 2025 |
“Eliezer’s Lost Alignment Articles / The Arbital Sequence” by Ruby
|
Feb 20, 2025 |
“Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby
|
Feb 20, 2025 |
“How to Make Superbabies” by GeneSmith, kman
|
Feb 20, 2025 |
“A computational no-coincidence principle” by Eric Neyman
|
Feb 19, 2025 |
“A History of the Future, 2025-2040” by L Rudolf L
|
Feb 19, 2025 |
“It’s been ten years. I propose HPMOR Anniversary Parties.” by Screwtape
|
Feb 18, 2025 |
“Some articles in ‘International Security’ that I enjoyed” by Buck
|
Feb 16, 2025 |
“The Failed Strategy of Artificial Intelligence Doomers” by Ben Pace
|
Feb 16, 2025 |
“Murder plots are infohazards” by Chris Monteiro
|
Feb 14, 2025 |
“Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?” by garrison
|
Feb 11, 2025 |
“The ‘Think It Faster’ Exercise” by Raemon
|
Feb 09, 2025 |
“So You Want To Make Marginal Progress...” by johnswentworth
|
Feb 08, 2025 |
“What is malevolence? On the nature, measurement, and distribution of dark traits” by David Althaus
|
Feb 08, 2025 |
“How AI Takeover Might Happen in 2 Years” by joshc
|
Feb 08, 2025 |
“Gradual Disempowerment, Shell Games and Flinches” by Jan_Kulveit
|
Feb 05, 2025 |
“Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development” by Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet), David Duvenaud
|
Feb 04, 2025 |
“Planning for Extreme AI Risks” by joshc
|
Feb 03, 2025 |
“Catastrophe through Chaos” by Marius Hobbhahn
|
Feb 03, 2025 |
“Will alignment-faking Claude accept a deal to reveal its misalignment?” by ryan_greenblatt
|
Feb 01, 2025 |
“‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes
|
Jan 30, 2025 |
“Ten people on the inside” by Buck
|
Jan 29, 2025 |
“Anomalous Tokens in DeepSeek-V3 and r1” by henry
|
Jan 28, 2025 |
“Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans
|
Jan 28, 2025 |
“Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals” by johnswentworth, David Lorell
|
Jan 27, 2025 |
“A Three-Layer Model of LLM Psychology” by Jan_Kulveit
|
Jan 26, 2025 |
“Training on Documents About Reward Hacking Induces Reward Hacking” by evhub
|
Jan 24, 2025 |
“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt
|
Jan 24, 2025 |
“Mechanisms too simple for humans to design” by Malmesbury
|
Jan 24, 2025 |
“The Gentle Romance” by Richard_Ngo
|
Jan 22, 2025 |
“Quotes from the Stargate press conference” by Nikola Jurkovic
|
Jan 22, 2025 |
“The Case Against AI Control Research” by johnswentworth
|
Jan 21, 2025 |
“Don’t ignore bad vibes you get from people” by Kaj_Sotala
|
Jan 20, 2025 |
“[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty” by tandem
|
Jan 19, 2025 |
“Building AI Research Fleets” by bgold, Jesse Hoogland
|
Jan 18, 2025 |
“What Is The Alignment Problem?” by johnswentworth
|
Jan 17, 2025 |
“Applying traditional economic thinking to AGI: a trilemma” by Steven Byrnes
|
Jan 14, 2025 |
“Passages I Highlighted in The Letters of J.R.R.Tolkien” by Ivan Vendrov
|
Jan 14, 2025 |
“Parkinson’s Law and the Ideology of Statistics” by Benquo
|
Jan 13, 2025 |
“Capital Ownership Will Not Prevent Human Disempowerment” by beren
|
Jan 11, 2025 |
“Activation space interpretability may be doomed” by bilalchughtai, Lucius Bushnaq
|
Jan 10, 2025 |
“What o3 Becomes by 2028” by Vladimir_Nesov
|
Jan 09, 2025 |
“What Indicators Should We Watch to Disambiguate AGI Timelines?” by snewman
|
Jan 09, 2025 |
“How will we update about scheming?” by ryan_greenblatt
|
Jan 08, 2025 |
“OpenAI #10: Reflections” by Zvi
|
Jan 08, 2025 |
“Maximizing Communication, not Traffic” by jefftk
|
Jan 07, 2025 |
“What’s the short timeline plan?” by Marius Hobbhahn
|
Jan 02, 2025 |
“Shallow review of technical AI safety, 2024” by technicalities, Stag, Stephen McAleese, jordine, Dr. David Mathers
|
Dec 30, 2024 |
“By default, capital will matter more than ever after AGI” by L Rudolf L
|
Dec 29, 2024 |
“Review: Planecrash” by L Rudolf L
|
Dec 28, 2024 |
“The Field of AI Alignment: A Postmortem, and What To Do About It” by johnswentworth
|
Dec 26, 2024 |
“When Is Insurance Worth It?” by kqr
|
Dec 23, 2024 |
“Orienting to 3 year AGI timelines” by Nikola Jurkovic
|
Dec 23, 2024 |
“What Goes Without Saying” by sarahconstantin
|
Dec 21, 2024 |
“o3” by Zach Stein-Perlman
|
Dec 21, 2024 |
“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit
|
Dec 21, 2024 |
“AIs Will Increasingly Attempt Shenanigans” by Zvi
|
Dec 19, 2024 |
“Alignment Faking in Large Language Models” by ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck
|
Dec 18, 2024 |
“Communications in Hard Mode (My new job at MIRI)” by tanagrabeast
|
Dec 15, 2024 |
“Biological risk from the mirror world” by jasoncrawford
|
Dec 13, 2024 |
“Subskills of ‘Listening to Wisdom’” by Raemon
|
Dec 13, 2024 |
“Understanding Shapley Values with Venn Diagrams” by Carson L
|
Dec 13, 2024 |
“LessWrong audio: help us choose the new voice” by PeterH
|
Dec 12, 2024 |
“Understanding Shapley Values with Venn Diagrams” by agucova
|
Dec 11, 2024 |
“o1: A Technical Primer” by Jesse Hoogland
|
Dec 11, 2024 |
“Gradient Routing: Masking Gradients to Localize Computation in Neural Networks” by cloud, Jacob G-W, Evzen, Joseph Miller, TurnTrout
|
Dec 09, 2024 |
“Frontier Models are Capable of In-context Scheming” by Marius Hobbhahn, AlexMeinke, Bronson Schoen
|
Dec 06, 2024 |
“(The) Lightcone is nothing without its people: LW + Lighthaven’s first big fundraiser” by habryka
|
Nov 30, 2024 |
“Repeal the Jones Act of 1920” by Zvi
|
Nov 29, 2024 |
“China Hawks are Manufacturing an AI Arms Race” by garrison
|
Nov 29, 2024 |
“Information vs Assurance” by johnswentworth
|
Nov 27, 2024 |
“You are not too ‘irrational’ to know your preferences.” by DaystarEld
|
Nov 27, 2024 |
“‘The Solomonoff Prior is Malign’ is a special case of a simpler argument” by David Matolcsi
|
Nov 25, 2024 |
“‘It’s a 10% chance which I did 10 times, so it should be 100%’” by egor.timatkov
|
Nov 20, 2024 |
“OpenAI Email Archives” by habryka
|
Nov 19, 2024 |
“Ayn Rand’s model of ‘living money’; and an upside of burnout” by AnnaSalamon
|
Nov 18, 2024 |
“Neutrality” by sarahconstantin
|
Nov 17, 2024 |
“Making a conservative case for alignment” by Cameron Berg, Judd Rosenblatt, phgubbins, AE Studio
|
Nov 16, 2024 |
“OpenAI Email Archives (from Musk v. Altman)” by habryka
|
Nov 16, 2024 |
“Catastrophic sabotage as a major threat model for human-level AI systems” by evhub
|
Nov 15, 2024 |
“The Online Sports Gambling Experiment Has Failed” by Zvi
|
Nov 12, 2024 |
“o1 is a bad idea” by abramdemski
|
Nov 12, 2024 |
“Current safety training techniques do not fully transfer to the agent setting” by Simon Lermen, Govind Pimpale
|
Nov 09, 2024 |
“Explore More: A Bag of Tricks to Keep Your Life on the Rails” by Shoshannah Tekofsky
|
Nov 04, 2024 |
“Survival without dignity” by L Rudolf L
|
Nov 04, 2024 |
“The Median Researcher Problem” by johnswentworth
|
Nov 04, 2024 |
“The Compendium, A full argument about extinction risk from AGI” by adamShimi, Gabriel Alfour, Connor Leahy, Chris Scammell, Andrea_Miotti
|
Nov 01, 2024 |
“What TMS is like” by Sable
|
Oct 31, 2024 |
“The hostile telepaths problem” by Valentine
|
Oct 28, 2024 |
“A bird’s eye view of ARC’s research” by Jacob_Hilton
|
Oct 27, 2024 |
“A Rocket–Interpretability Analogy” by plex
|
Oct 25, 2024 |
“I got dysentery so you don’t have to” by eukaryote
|
Oct 24, 2024 |
“Overcoming Bias Anthology” by Arjun Panickssery
|
Oct 23, 2024 |
“Arithmetic is an underrated world-modeling technology” by dynomight
|
Oct 22, 2024 |
“My theory of change for working in AI healthtech” by Andrew_Critch
|
Oct 15, 2024 |
“Why I’m not a Bayesian” by Richard_Ngo
|
Oct 15, 2024 |
“The AGI Entente Delusion” by Max Tegmark
|
Oct 14, 2024 |
“Momentum of Light in Glass” by Ben
|
Oct 14, 2024 |
“Overview of strong human intelligence amplification methods” by TsviBT
|
Oct 09, 2024 |
“Struggling like a Shadowmoth” by Raemon
|
Oct 03, 2024 |
“Three Subtle Examples of Data Leakage” by abstractapplic
|
Oct 03, 2024 |
“the case for CoT unfaithfulness is overstated” by nostalgebraist
|
Sep 30, 2024 |
“Cryonics is free” by Mati_Roy
|
Sep 30, 2024 |
“Stanislav Petrov Quarterly Performance Review” by Ricki Heicklen
|
Sep 29, 2024 |
“Laziness death spirals” by PatrickDFarley
|
Sep 29, 2024 |
“‘Slow’ takeoff is a terrible term for ‘maybe even faster takeoff, actually’” by Raemon
|
Sep 29, 2024 |
“ASIs will not leave just a little sunlight for Earth ” by Eliezer Yudkowsky
|
Sep 23, 2024 |
“Skills from a year of Purposeful Rationality Practice ” by Raemon
|
Sep 21, 2024 |
“How I started believing religion might actually matter for rationality and moral philosophy ” by zhukeepa
|
Sep 19, 2024 |
“Did Christopher Hitchens change his mind about waterboarding? ” by Isaac King
|
Sep 17, 2024 |
“The Great Data Integration Schlep ” by sarahconstantin
|
Sep 15, 2024 |
“Contra papers claiming superhuman AI forecasting ” by nikos, Peter Mühlbacher, Lawrence Phillips, dschwarz
|
Sep 14, 2024 |
“OpenAI o1 ” by Zach Stein-Perlman
|
Sep 13, 2024 |
“The Best Lay Argument is not a Simple English Yud Essay ” by J Bostock
|
Sep 11, 2024 |
“My Number 1 Epistemology Book Recommendation: Inventing Temperature ” by adamShimi
|
Sep 10, 2024 |
“That Alien Message - The Animation ” by Writer
|
Sep 09, 2024 |
“Pay Risk Evaluators in Cash, Not Equity ” by Adam Scholl
|
Sep 07, 2024 |
“Survey: How Do Elite Chinese Students Feel About the Risks of AI? ” by Nick Corvino
|
Sep 07, 2024 |
“things that confuse me about the current AI market. ” by DMMF
|
Sep 02, 2024 |
“Nursing doubts ” by dynomight
|
Sep 01, 2024 |
“Principles for the AGI Race ” by William_S
|
Aug 31, 2024 |
“The Information: OpenAI shows ‘Strawberry’ to feds, races to launch it ” by Martín Soto
|
Aug 29, 2024 |
“What is it to solve the alignment problem? ” by Joe Carlsmith
|
Aug 28, 2024 |
“Limitations on Formal Verification for AI Safety ” by Andrew Dickson
|
Aug 27, 2024 |
“Would catching your AIs trying to escape convince AI developers to slow down or undeploy? ” by Buck
|
Aug 27, 2024 |
“Liability regimes for AI ” by Ege Erdil
|
Aug 23, 2024 |
“AGI Safety and Alignment at Google DeepMind:A Summary of Recent Work ” by Rohin Shah, Seb Farquhar, Anca Dragan
|
Aug 21, 2024 |
“Fields that I reference when thinking about AI takeover prevention” by Buck
|
Aug 15, 2024 |
“WTH is Cerebrolysin, actually?” by gsfitzgerald, delton137
|
Aug 13, 2024 |
“You can remove GPT2’s LayerNorm by fine-tuning for an hour” by StefanHex
|
Aug 10, 2024 |
“Leaving MIRI, Seeking Funding” by abramdemski
|
Aug 09, 2024 |
“How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage” by orthonormal
|
Aug 08, 2024 |
“This is already your second chance” by Malmesbury
|
Aug 07, 2024 |
“0. CAST: Corrigibility as Singular Target” by Max Harms
|
Aug 07, 2024 |
“Self-Other Overlap: A Neglected Approach to AI Alignment” by Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena
|
Aug 07, 2024 |
“You don’t know how bad most things are nor precisely how they’re bad.” by Solenoid_Entity
|
Aug 07, 2024 |
“Recommendation: reports on the search for missing hiker Bill Ewasko” by eukaryote
|
Aug 07, 2024 |
“The ‘strong’ feature hypothesis could be wrong” by lsgos
|
Aug 07, 2024 |
“‘AI achieves silver-medal standard solving International Mathematical Olympiad problems’” by gjm
|
Jul 30, 2024 |
“Decomposing Agency — capabilities without desires” by owencb, Raymond D
|
Jul 29, 2024 |
“Universal Basic Income and Poverty” by Eliezer Yudkowsky
|
Jul 27, 2024 |
“Optimistic Assumptions, Longterm Planning, and ‘Cope’” by Raemon
|
Jul 19, 2024 |
“Superbabies: Putting The Pieces Together” by sarahconstantin
|
Jul 15, 2024 |
“Poker is a bad game for teaching epistemics. Figgie is a better one.” by rossry
|
Jul 12, 2024 |
“Reliable Sources: The Story of David Gerard” by TracingWoodgrains
|
Jul 11, 2024 |
“When is a mind me?” by Rob Bensinger
|
Jul 08, 2024 |
“80,000 hours should remove OpenAI from the Job Board (and similar orgs should do similarly)” by Raemon
|
Jul 04, 2024 |
[Linkpost] “introduction to cancer vaccines” by bhauth
|
Jul 02, 2024 |
“Priors and Prejudice” by MathiasKB
|
Jul 02, 2024 |
“My experience using financial commitments to overcome akrasia” by William Howard
|
Jul 02, 2024 |
“The Incredible Fentanyl-Detecting Machine” by sarahconstantin
|
Jul 01, 2024 |
“AI catastrophes and rogue deployments” by Buck
|
Jul 01, 2024 |
“Loving a world you don’t trust” by Joe Carlsmith
|
Jul 01, 2024 |
“Formal verification, heuristic explanations and surprise accounting” by paulfchristiano
|
Jun 27, 2024 |
“LLM Generality is a Timeline Crux” by eggsyntax
|
Jun 25, 2024 |
“SAE feature geometry is outside the superposition hypothesis” by jake_mendel
|
Jun 25, 2024 |
“Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data” by Johannes Treutlein, Owain_Evans
|
Jun 23, 2024 |
“Boycott OpenAI” by PeterMcCluskey
|
Jun 21, 2024 |
“Sycophancy to subterfuge: Investigating reward tampering in large language models” by evhub, Carson Denison
|
Jun 20, 2024 |
“I would have shit in that alley, too” by Declan Molony
|
Jun 18, 2024 |
“Getting 50% (SoTA) on ARC-AGI with GPT-4o” by ryan_greenblatt
|
Jun 18, 2024 |
“Why I don’t believe in the placebo effect” by transhumanist_atom_understander
|
Jun 15, 2024 |
“Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)” by Andrew_Critch
|
Jun 14, 2024 |
“My AI Model Delta Compared To Christiano” by johnswentworth
|
Jun 13, 2024 |
“My AI Model Delta Compared To Yudkowsky” by johnswentworth
|
Jun 10, 2024 |
“Response to Aschenbrenner’s ‘Situational Awareness’” by Rob Bensinger
|
Jun 07, 2024 |
“Humming is not a free $100 bill” by Elizabeth
|
Jun 07, 2024 |
“Announcing ILIAD — Theoretical AI Alignment Conference ” by Nora_Ammann, Alexander Gietelink Oldenziel
|
Jun 06, 2024 |
“Non-Disparagement Canaries for OpenAI” by aysja, Adam Scholl
|
May 31, 2024 |
“MIRI 2024 Communications Strategy” by Gretta Duleba
|
May 30, 2024 |
“OpenAI: Fallout” by Zvi
|
May 28, 2024 |
[HUMAN VOICE] Update on human narration for this podcast
|
May 28, 2024 |
“Maybe Anthropic’s Long-Term Benefit Trust is powerless” by Zach Stein-Perlman
|
May 28, 2024 |
“Notifications Received in 30 Minutes of Class” by tanagrabeast
|
May 27, 2024 |
“AI companies aren’t really using external evaluators” by Zach Stein-Perlman
|
May 24, 2024 |
“EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024” by scasper
|
May 24, 2024 |
“What’s Going on With OpenAI’s Messaging?” by ozziegoen
|
May 22, 2024 |
“Language Models Model Us” by eggsyntax
|
May 21, 2024 |
Jaan Tallinn’s 2023 Philanthropy Overview
|
May 21, 2024 |
“OpenAI: Exodus” by Zvi
|
May 21, 2024 |
DeepMind’s ”Frontier Safety Framework” is weak and unambitious
|
May 20, 2024 |
Do you believe in hundred dollar bills lying on the ground? Consider humming
|
May 18, 2024 |
Deep Honesty
|
May 12, 2024 |
On Not Pulling The Ladder Up Behind You
|
May 02, 2024 |
Mechanistically Eliciting Latent Behaviors in Language Models
|
May 02, 2024 |
Ironing Out the Squiggles
|
May 01, 2024 |
Introducing AI Lab Watch
|
May 01, 2024 |
Refusal in LLMs is mediated by a single direction
|
Apr 28, 2024 |
Funny Anecdote of Eliezer From His Sister
|
Apr 24, 2024 |
Thoughts on seed oil
|
Apr 21, 2024 |
Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
|
Apr 19, 2024 |
Express interest in an “FHI of the West”
|
Apr 18, 2024 |
Transformers Represent Belief State Geometry in their Residual Stream
|
Apr 17, 2024 |
Paul Christiano named as US AI Safety Institute Head of AI Safety
|
Apr 16, 2024 |
[HUMAN VOICE] "How could I have thought that faster?" by mesaoptimizer
|
Apr 12, 2024 |
[HUMAN VOICE] "My PhD thesis: Algorithmic Bayesian Epistemology" by Eric Neyman
|
Apr 12, 2024 |
[HUMAN VOICE] "Toward a Broader Conception of Adverse Selection" by Ricki Heicklen
|
Apr 12, 2024 |
[HUMAN VOICE] "On green" by Joe Carlsmith
|
Apr 12, 2024 |
LLMs for Alignment Research: a safety priority?
|
Apr 06, 2024 |
[HUMAN VOICE] "Social status part 1/2: negotiations over object-level preferences" by Steven Byrnes
|
Apr 05, 2024 |
[HUMAN VOICE] "Using axis lines for good or evil" by dynomight
|
Apr 05, 2024 |
[HUMAN VOICE] "Scale Was All We Needed, At First" by Gabriel Mukobi
|
Apr 05, 2024 |
[HUMAN VOICE] "Acting Wholesomely" by OwenCB
|
Apr 05, 2024 |
The Story of “I Have Been A Good Bing”
|
Apr 01, 2024 |
The Best Tacit Knowledge Videos on Every Subject
|
Apr 01, 2024 |
[HUMAN VOICE] "My Clients, The Liars" by ymeskhout
|
Mar 20, 2024 |
[HUMAN VOICE] "Deep atheism and AI risk" by Joe Carlsmith
|
Mar 20, 2024 |
[HUMAN VOICE] "CFAR Takeaways: Andrew Critch" by Raemon
|
Mar 10, 2024 |
[HUMAN VOICE] "Speaking to Congressional staffers about AI risk" by Akash, hath
|
Mar 10, 2024 |
Many arguments for AI x-risk are wrong
|
Mar 09, 2024 |
Tips for Empirical Alignment Research
|
Mar 07, 2024 |
Timaeus’s First Four Months
|
Feb 29, 2024 |
Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
|
Feb 23, 2024 |
[HUMAN VOICE] "And All the Shoggoths Merely Players" by Zack_M_Davis
|
Feb 20, 2024 |
[HUMAN VOICE] "Updatelessness doesn't solve most problems" by Martín Soto
|
Feb 20, 2024 |
Every “Every Bay Area House Party” Bay Area House Party
|
Feb 19, 2024 |
2023 Survey Results
|
Feb 19, 2024 |
Raising children on the eve of AI
|
Feb 18, 2024 |
“No-one in my org puts money in their pension”
|
Feb 18, 2024 |
Masterpiece
|
Feb 16, 2024 |
CFAR Takeaways: Andrew Critch
|
Feb 15, 2024 |
[HUMAN VOICE] "Believing In" by Anna Salamon
|
Feb 14, 2024 |
[HUMAN VOICE] "Attitudes about Applied Rationality" by Camille Berger
|
Feb 14, 2024 |
Scale Was All We Needed, At First
|
Feb 14, 2024 |
Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
|
Feb 11, 2024 |
[HUMAN VOICE] "A Shutdown Problem Proposal" by johnswentworth, David Lorell
|
Feb 09, 2024 |
Brute Force Manufactured Consensus is Hiding the Crime of the Century
|
Feb 04, 2024 |
[HUMAN VOICE] "Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI" by Jeremy Gillen, peterbarnett
|
Feb 03, 2024 |
Leading The Parade
|
Feb 02, 2024 |
[HUMAN VOICE] "The case for ensuring that powerful AIs are controlled" by ryan_greenblatt, Buck
|
Feb 02, 2024 |
Processor clock speeds are not how fast AIs think
|
Feb 01, 2024 |
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
|
Jan 31, 2024 |
Making every researcher seek grants is a broken model
|
Jan 29, 2024 |
The case for training frontier AIs on Sumerian-only corpus
|
Jan 28, 2024 |
This might be the last AI Safety Camp
|
Jan 25, 2024 |
[HUMAN VOICE] "There is way too much serendipity" by Malmesbury
|
Jan 22, 2024 |
[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka
|
Jan 20, 2024 |
[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al
|
Jan 20, 2024 |
The impossible problem of due process
|
Jan 17, 2024 |
[HUMAN VOICE] "Gentleness and the artificial Other" by Joe Carlsmith
|
Jan 14, 2024 |
Introducing Alignment Stress-Testing at Anthropic
|
Jan 14, 2024 |
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
|
Jan 13, 2024 |
[HUMAN VOICE] "Meaning & Agency" by Abram Demski
|
Jan 07, 2024 |
What’s up with LLMs representing XORs of arbitrary features?
|
Jan 07, 2024 |
Gentleness and the artificial Other
|
Jan 05, 2024 |
MIRI 2024 Mission and Strategy Update
|
Jan 05, 2024 |
The Plan - 2023 Version
|
Jan 04, 2024 |
Apologizing is a Core Rationalist Skill
|
Jan 03, 2024 |
[HUMAN VOICE] "A case for AI alignment being difficult" by jessicata
|
Jan 02, 2024 |
The Dark Arts
|
Jan 01, 2024 |
Critical review of Christiano’s disagreements with Yudkowsky
|
Dec 28, 2023 |
Most People Don’t Realize We Have No Idea How Our AIs Work
|
Dec 27, 2023 |
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
|
Dec 26, 2023 |
Succession
|
Dec 24, 2023 |
Nonlinear’s Evidence: Debunking False and Misleading Claims
|
Dec 21, 2023 |
Effective Aspersions: How the Nonlinear Investigation Went Wrong
|
Dec 20, 2023 |
Constellations are Younger than Continents
|
Dec 20, 2023 |
The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda
|
Dec 19, 2023 |
“Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity
|
Dec 18, 2023 |
Is being sexy for your homies?
|
Dec 17, 2023 |
[HUMAN VOICE] "Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible" by Gene Smith and Kman
|
Dec 17, 2023 |
[HUMAN VOICE] "Moral Reality Check (a short story)" by jessicata
|
Dec 15, 2023 |
AI Control: Improving Safety Despite Intentional Subversion
|
Dec 15, 2023 |
2023 Unofficial LessWrong Census/Survey
|
Dec 13, 2023 |
The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
|
Dec 13, 2023 |
[HUMAN VOICE] "What are the results of more parental supervision and less outdoor play?" by Julia Wise
|
Dec 13, 2023 |
Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
|
Dec 12, 2023 |
re: Yudkowsky on biological materials
|
Dec 11, 2023 |
Speaking to Congressional staffers about AI risk
|
Dec 05, 2023 |
[HUMAN VOICE] "Shallow review of live agendas in alignment & safety" by technicalities & Stag
|
Dec 04, 2023 |
Thoughts on “AI is easy to control” by Pope & Belrose
|
Dec 02, 2023 |
The 101 Space You Will Always Have With You
|
Nov 30, 2023 |
[HUMAN VOICE] "Social Dark Matter" by Duncan Sabien
|
Nov 28, 2023 |
Shallow review of live agendas in alignment & safety
|
Nov 28, 2023 |
Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
|
Nov 25, 2023 |
[HUMAN VOICE] "The 6D effect: When companies take risks, one email can be very powerful." by scasper
|
Nov 23, 2023 |
OpenAI: The Battle of the Board
|
Nov 22, 2023 |
OpenAI: Facts from a Weekend
|
Nov 20, 2023 |
Sam Altman fired from OpenAI
|
Nov 18, 2023 |
Social Dark Matter
|
Nov 17, 2023 |
"You can just spontaneously call people you haven't met in years" by lc
|
Nov 17, 2023 |
[HUMAN VOICE] "Thinking By The Clock" by Screwtape
|
Nov 17, 2023 |
"EA orgs' legal structure inhibits risk taking and information sharing on the margin" by Elizabeth
|
Nov 17, 2023 |
[HUMAN VOICE] "AI Timelines" by habryka, Daniel Kokotajlo, Ajeya Cotra, Ege Erdil
|
Nov 17, 2023 |
"Integrity in AI Governance and Advocacy" by habryka, Olivia Jimenez
|
Nov 17, 2023 |
Loudly Give Up, Don’t Quietly Fade
|
Nov 16, 2023 |
[HUMAN VOICE] "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning" by Zac Hatfield-Dodds
|
Nov 09, 2023 |
[HUMAN VOICE] "Deception Chess: Game #1" by Zane et al.
|
Nov 09, 2023 |
"Does davidad's uploading moonshot work?" by jacobjabob et al.
|
Nov 09, 2023 |
"The other side of the tidal wave" by Katja Grace
|
Nov 09, 2023 |
"The 6D effect: When companies take risks, one email can be very powerful." by scasper
|
Nov 09, 2023 |
Comp Sci in 2027 (Short story by Eliezer Yudkowsky)
|
Nov 09, 2023 |
"My thoughts on the social response to AI risk" by Matthew Barnett
|
Nov 09, 2023 |
"Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk" by 1a3orn
|
Nov 09, 2023 |
"President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence" by Tristan Williams
|
Nov 03, 2023 |
"Thoughts on the AI Safety Summit company policy requests and responses" by So8res
|
Nov 03, 2023 |
[Human Voice] "Book Review: Going Infinite" by Zvi
|
Oct 31, 2023 |
"Announcing Timaeus" by Jesse Hoogland et al.
|
Oct 30, 2023 |
"Thoughts on responsible scaling policies and regulation" by Paul Christiano
|
Oct 30, 2023 |
"AI as a science, and three obstacles to alignment strategies" by Nate Soares
|
Oct 30, 2023 |
"Architects of Our Own Demise: We Should Stop Developing AI" by Roko
|
Oct 30, 2023 |
"At 87, Pearl is still able to change his mind" by rotatingpaguro
|
Oct 30, 2023 |
"We're Not Ready: thoughts on "pausing" and responsible scaling policies" by Holden Karnofsky
|
Oct 30, 2023 |
[HUMAN VOICE] "Alignment Implications of LLM Successes: a Debate in One Act" by Zack M Davis
|
Oct 23, 2023 |
"LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B" by Simon Lermen & Jeffrey Ladish.
|
Oct 23, 2023 |
"Holly Elmore and Rob Miles dialogue on AI Safety Advocacy" by jacobjacob, Robert Miles & Holly_Elmore
|
Oct 23, 2023 |
"Labs should be explicit about why they are building AGI" by Peter Barnett
|
Oct 19, 2023 |
[HUMAN VOICE] "Sum-threshold attacks" by TsviBT
|
Oct 18, 2023 |
"Will no one rid me of this turbulent pest?" by Metacelsus
|
Oct 18, 2023 |
"RSPs are pauses done right" by evhub
|
Oct 15, 2023 |
[HUMAN VOICE] "Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth
|
Oct 15, 2023 |
"Cohabitive Games so Far" by mako yass
|
Oct 15, 2023 |
"Announcing MIRI’s new CEO and leadership team" by Gretta Duleba
|
Oct 15, 2023 |
"Comparing Anthropic's Dictionary Learning to Ours" by Robert_AIZI
|
Oct 15, 2023 |
"Announcing Dialogues" by Ben Pace
|
Oct 09, 2023 |
"Towards Monosemanticity: Decomposing Language Models With Dictionary Learning" by Zac Hatfield-Dodds
|
Oct 09, 2023 |
"Evaluating the historical value misspecification argument" by Matthew Barnett
|
Oct 09, 2023 |
"Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn" by Zvi
|
Oct 09, 2023 |
"Thomas Kwa's MIRI research experience" by Thomas Kwa and others
|
Oct 06, 2023 |
"EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem" by Elizabeth
|
Oct 03, 2023 |
"How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" by Jan Brauner et al.
|
Oct 03, 2023 |
"The Lighthaven Campus is open for bookings" by Habryka
|
Oct 03, 2023 |
"'Diamondoid bacteria' nanobots: deadly threat or dead-end? A nanotech investigation" by titotal
|
Oct 03, 2023 |
"The King and the Golem" by Richard Ngo
|
Sep 29, 2023 |
"Sparse Autoencoders Find Highly Interpretable Directions in Language Models" by Logan Riggs et al
|
Sep 27, 2023 |
"Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth
|
Sep 26, 2023 |
"There should be more AI safety orgs" by Marius Hobbhahn
|
Sep 25, 2023 |
"The Talk: a brief explanation of sexual dimorphism" by Malmesbury
|
Sep 22, 2023 |
"A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX" by jacobjacob
|
Sep 20, 2023 |
"AI presidents discuss AI alignment agendas" by TurnTrout & Garrett Baker
|
Sep 19, 2023 |
"UDT shows that decision theory is more puzzling than ever" by Wei Dai
|
Sep 18, 2023 |
"Sum-threshold attacks" by TsviBT
|
Sep 11, 2023 |
"A list of core AI safety problems and how I hope to solve them" by Davidad
|
Sep 09, 2023 |
"Report on Frontier Model Training" by Yafah Edelman
|
Sep 09, 2023 |
"Defunding My Mistake" by ymeskhout
|
Sep 08, 2023 |
"Sharing Information About Nonlinear" by Ben Pace
|
Sep 08, 2023 |
"One Minute Every Moment" by abramdemski
|
Sep 08, 2023 |
"What I would do if I wasn’t at ARC Evals" by LawrenceC
|
Sep 08, 2023 |
"The U.S. is becoming less stable" by lc
|
Sep 04, 2023 |
"Meta Questions about Metaphilosophy" by Wei Dai
|
Sep 04, 2023 |
"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist
|
Sep 04, 2023 |
"Dear Self; we need to talk about ambition" by Elizabeth
|
Aug 30, 2023 |
"Book Launch: "The Carving of Reality," Best of LessWrong vol. III" by Raemon
|
Aug 28, 2023 |
"Assume Bad Faith" by Zack_M_Davis
|
Aug 28, 2023 |
"Large Language Models will be Great for Censorship" by Ethan Edwards
|
Aug 23, 2023 |
"Ten Thousand Years of Solitude" by agp
|
Aug 22, 2023 |
"6 non-obvious mental health issues specific to AI safety" by Igor Ivanov
|
Aug 22, 2023 |
"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël
|
Aug 21, 2023 |
"Inflection.ai is a major AGI lab" by Nikola
|
Aug 15, 2023 |
"Feedbackloop-first Rationality" by Raemon
|
Aug 15, 2023 |
"When can we trust model evaluations?" bu evhub
|
Aug 09, 2023 |
"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez
|
Aug 09, 2023 |
"My current LK99 questions" by Eliezer Yudkowsky
|
Aug 04, 2023 |
"The "public debate" about AI is confusing for the general public and for policymakers because it is a three-sided debate" by Adam David Long
|
Aug 04, 2023 |
"ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks" by Beth Barnes
|
Aug 04, 2023 |
"Thoughts on sharing information about language model capabilities" by paulfchristiano
|
Aug 02, 2023 |
"Yes, It's Subjective, But Why All The Crabs?" by johnswentworth
|
Jul 31, 2023 |
"Self-driving car bets" by paulfchristiano
|
Jul 31, 2023 |
"Cultivating a state of mind where new ideas are born" by Henrik Karlsson
|
Jul 31, 2023 |
"Rationality !== Winning" by Raemon
|
Jul 28, 2023 |
"Brain Efficiency Cannell Prize Contest Award Ceremony" by Alexander Gietelink Oldenziel
|
Jul 28, 2023 |
"Grant applications and grand narratives" by Elizabeth
|
Jul 28, 2023 |
"Cryonics and Regret" by MvB
|
Jul 28, 2023 |
"Unifying Bargaining Notions (2/2)" by Diffractor
|
Jun 12, 2023 |
"The ants and the grasshopper" by Richard Ngo
|
Jun 06, 2023 |
"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.
|
May 18, 2023 |
"An artificially structured argument for expecting AGI ruin" by Rob Bensinger
|
May 16, 2023 |
"How much do you believe your results?" by Eric Neyman
|
May 10, 2023 |
"Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)" by Chris Scammell & DivineMango
|
Apr 27, 2023 |
"On AutoGPT" by Zvi
|
Apr 19, 2023 |
"GPTs are Predictors, not Imitators" by Eliezer Yudkowsky
|
Apr 12, 2023 |
"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky
|
Apr 05, 2023 |
"A stylized dialogue on John Wentworth's claims about markets and optimization" by Nate Soares
|
Apr 05, 2023 |
"Deep Deceptiveness" by Nate Soares
|
Apr 05, 2023 |
"The Onion Test for Personal and Institutional Honesty" by Chana Messinger & Andrew Critch
|
Mar 28, 2023 |
"There’s no such thing as a tree (phylogenetically)" by Eukaryote
|
Mar 28, 2023 |
"Losing the root for the tree" by Adam Zerner
|
Mar 28, 2023 |
"Lies, Damn Lies, and Fabricated Options" by Duncan Sabien
|
Mar 28, 2023 |
"Why I think strong general AI is coming soon" by Porby
|
Mar 28, 2023 |
"It Looks Like You’re Trying To Take Over The World" by Gwern
|
Mar 28, 2023 |
"What failure looks like" by Paul Christiano
|
Mar 28, 2023 |
"More information about the dangerous capability evaluations we did with GPT-4 and Claude." by Beth Barnes
|
Mar 21, 2023 |
""Carefully Bootstrapped Alignment" is organizationally hard" by Raemon
|
Mar 21, 2023 |
"The Parable of the King and the Random Process" by moridinamael
|
Mar 14, 2023 |
"Enemies vs Malefactors" by Nate Soares
|
Mar 14, 2023 |
"The Waluigi Effect (mega-post)" by Cleo Nardo
|
Mar 08, 2023 |
"Acausal normalcy" by Andrew Critch
|
Mar 06, 2023 |
"Please don't throw your mind away" by TsviBT
|
Mar 01, 2023 |
"Cyborgism" by Nicholas Kees & Janus
|
Feb 15, 2023 |
"Childhoods of exceptional people" by Henrik Karlsson
|
Feb 14, 2023 |
"What I mean by "alignment is in large part about making cognition aimable at all"" by Nate Soares
|
Feb 13, 2023 |
"On not getting contaminated by the wrong obesity ideas" by Natália Coelho Mendonça
|
Feb 10, 2023 |
"SolidGoldMagikarp (plus, prompt generation)"
|
Feb 08, 2023 |
"Focus on the places where you feel shocked everyone's dropping the ball" by Nate Soares
|
Feb 03, 2023 |
"Basics of Rationalist Discourse" by Duncan Sabien
|
Feb 02, 2023 |
"Sapir-Whorf for Rationalists" by Duncan Sabien
|
Jan 31, 2023 |
"My Model Of EA Burnout" by Logan Strohl
|
Jan 31, 2023 |
"The Social Recession: By the Numbers" by Anton Stjepan Cebalo
|
Jan 25, 2023 |
"Recursive Middle Manager Hell" by Raemon
|
Jan 24, 2023 |
"How 'Discovering Latent Knowledge in Language Models Without Supervision' Fits Into a Broader Alignment Scheme" by Collin
|
Jan 12, 2023 |
"Models Don't 'Get Reward'" by Sam Ringer
|
Jan 12, 2023 |
"The Feeling of Idea Scarcity" by John Wentworth
|
Jan 12, 2023 |
"The next decades might be wild" by Marius Hobbhahn
|
Dec 21, 2022 |
"Lessons learned from talking to >100 academics about AI safety" by Marius Hobbhahn
|
Nov 17, 2022 |
"How my team at Lightcone sometimes gets stuff done" by jacobjacob
|
Nov 10, 2022 |
"Decision theory does not imply that we get to have nice things" by So8res
|
Nov 08, 2022 |
"What 2026 looks like" by Daniel Kokotajlo
|
Nov 07, 2022 |
Counterarguments to the basic AI x-risk case
|
Nov 04, 2022 |
"Introduction to abstract entropy" by Alex Altair
|
Oct 29, 2022 |
"Consider your appetite for disagreements" by Adam Zerner
|
Oct 25, 2022 |
"My resentful story of becoming a medical miracle" by Elizabeth
|
Oct 21, 2022 |
"The Redaction Machine" by Ben
|
Oct 02, 2022 |
"Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover" by Ajeya Cotra
|
Sep 27, 2022 |
"The shard theory of human values" by Quintin Pope & TurnTrout
|
Sep 22, 2022 |
"Two-year update on my personal AI timelines" by Ajeya Cotra
|
Sep 22, 2022 |
"You Are Not Measuring What You Think You Are Measuring" by John Wentworth
|
Sep 21, 2022 |
"Do bamboos set themselves on fire?" by Malmesbury
|
Sep 20, 2022 |
"Toni Kurz and the Insanity of Climbing Mountains" by Gene Smith
|
Sep 18, 2022 |
"Deliberate Grieving" by Raemon
|
Sep 18, 2022 |
"Survey advice" by Katja Grace
|
Sep 18, 2022 |
"Language models seem to be much better than humans at next-token prediction" by Buck, Fabien and LawrenceC
|
Sep 15, 2022 |
"Humans are not automatically strategic" by Anna Salamon
|
Sep 15, 2022 |
"Local Validity as a Key to Sanity and Civilization" by Eliezer Yudkowsky
|
Sep 15, 2022 |
"Toolbox-thinking and Law-thinking" by Eliezer Yudkowsky
|
Sep 15, 2022 |
"Moral strategies at different capability levels" by Richard Ngo
|
Sep 14, 2022 |
"Worlds Where Iterative Design Fails" by John Wentworth
|
Sep 11, 2022 |
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
|
Sep 11, 2022 |
"Unifying Bargaining Notions (1/2)" by Diffractor
|
Sep 09, 2022 |
'Simulators' by Janus
|
Sep 05, 2022 |
"Humans provide an untapped wealth of evidence about alignment" by TurnTrout & Quintin Pope
|
Aug 08, 2022 |
"Changing the world through slack & hobbies" by Steven Byrnes
|
Jul 30, 2022 |
"«Boundaries», Part 1: a key missing concept from utility theory" by Andrew Critch
|
Jul 28, 2022 |
"ITT-passing and civility are good; "charity" is bad; steelmanning is niche" by Rob Bensinger
|
Jul 24, 2022 |
"What should you change in response to an "emergency"? And AI risk" by Anna Salamon
|
Jul 23, 2022 |
"On how various plans miss the hard bits of the alignment challenge" by Nate Soares
|
Jul 17, 2022 |
"Humans are very reliable agents" by Alyssa Vance
|
Jul 13, 2022 |
"Looking back on my alignment PhD" by TurnTrout
|
Jul 08, 2022 |
"It’s Probably Not Lithium" by Natália Coelho Mendonça
|
Jul 05, 2022 |
"What Are You Tracking In Your Head?" by John Wentworth
|
Jul 02, 2022 |
"Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment" by elspood
|
Jun 29, 2022 |
"Where I agree and disagree with Eliezer" by Paul Christiano
|
Jun 22, 2022 |
"Six Dimensions of Operational Adequacy in AGI Projects" by Eliezer Yudkowsky
|
Jun 21, 2022 |
"Moses and the Class Struggle" by lsusr
|
Jun 21, 2022 |
"Benign Boundary Violations" by Duncan Sabien
|
Jun 20, 2022 |
"AGI Ruin: A List of Lethalities" by Eliezer Yudkowsky
|
Jun 20, 2022 |