LessWrong (Curated & Popular)

By LessWrong

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by LessWrong

Category: Technology

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast
    

Subscribers: 16
Reviews: 0
Episodes: 460

Description

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Episode Date
“Auditing language models for hidden objectives” by Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Akbir Khan, Euan Ong, Christopher Olah, Fabien Roger, Meg, Drake Thomas, Adam Jermyn, Monte M, evhub
Mar 16, 2025
“The Most Forbidden Technique” by Zvi
Mar 14, 2025
“Trojan Sky” by Richard_Ngo
Mar 13, 2025
“OpenAI:” by Daniel Kokotajlo
Mar 11, 2025
“How Much Are LLMs Actually Boosting Real-World Programmer Productivity?” by Thane Ruthenis
Mar 09, 2025
“So how well is Claude playing Pokémon?” by Julian Bradshaw
Mar 09, 2025
“Methods for strong human germline engineering” by TsviBT
Mar 07, 2025
“Have LLMs Generated Novel Insights?” by abramdemski, Cole Wyeth
Mar 06, 2025
“A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis
Mar 06, 2025
“Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard
Mar 05, 2025
“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout
Mar 04, 2025
“Judgements: Merging Prediction & Evidence” by abramdemski
Mar 01, 2025
“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis
Feb 26, 2025
“Power Lies Trembling: a three-book review” by Richard_Ngo
Feb 26, 2025
“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans
Feb 26, 2025
“The Paris AI Anti-Safety Summit” by Zvi
Feb 22, 2025
“Eliezer’s Lost Alignment Articles / The Arbital Sequence” by Ruby
Feb 20, 2025
“Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby
Feb 20, 2025
“How to Make Superbabies” by GeneSmith, kman
Feb 20, 2025
“A computational no-coincidence principle” by Eric Neyman
Feb 19, 2025
“A History of the Future, 2025-2040” by L Rudolf L
Feb 19, 2025
“It’s been ten years. I propose HPMOR Anniversary Parties.” by Screwtape
Feb 18, 2025
“Some articles in ‘International Security’ that I enjoyed” by Buck
Feb 16, 2025
“The Failed Strategy of Artificial Intelligence Doomers” by Ben Pace
Feb 16, 2025
“Murder plots are infohazards” by Chris Monteiro
Feb 14, 2025
“Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?” by garrison
Feb 11, 2025
“The ‘Think It Faster’ Exercise” by Raemon
Feb 09, 2025
“So You Want To Make Marginal Progress...” by johnswentworth
Feb 08, 2025
“What is malevolence? On the nature, measurement, and distribution of dark traits” by David Althaus
Feb 08, 2025
“How AI Takeover Might Happen in 2 Years” by joshc
Feb 08, 2025
“Gradual Disempowerment, Shell Games and Flinches” by Jan_Kulveit
Feb 05, 2025
“Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development” by Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet), David Duvenaud
Feb 04, 2025
“Planning for Extreme AI Risks” by joshc
Feb 03, 2025
“Catastrophe through Chaos” by Marius Hobbhahn
Feb 03, 2025
“Will alignment-faking Claude accept a deal to reveal its misalignment?” by ryan_greenblatt
Feb 01, 2025
“‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes
Jan 30, 2025
“Ten people on the inside” by Buck
Jan 29, 2025
“Anomalous Tokens in DeepSeek-V3 and r1” by henry
Jan 28, 2025
“Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans
Jan 28, 2025
“Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals” by johnswentworth, David Lorell
Jan 27, 2025
“A Three-Layer Model of LLM Psychology” by Jan_Kulveit
Jan 26, 2025
“Training on Documents About Reward Hacking Induces Reward Hacking” by evhub
Jan 24, 2025
“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt
Jan 24, 2025
“Mechanisms too simple for humans to design” by Malmesbury
Jan 24, 2025
“The Gentle Romance” by Richard_Ngo
Jan 22, 2025
“Quotes from the Stargate press conference” by Nikola Jurkovic
Jan 22, 2025
“The Case Against AI Control Research” by johnswentworth
Jan 21, 2025
“Don’t ignore bad vibes you get from people” by Kaj_Sotala
Jan 20, 2025
“[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty” by tandem
Jan 19, 2025
“Building AI Research Fleets” by bgold, Jesse Hoogland
Jan 18, 2025
“What Is The Alignment Problem?” by johnswentworth
Jan 17, 2025
“Applying traditional economic thinking to AGI: a trilemma” by Steven Byrnes
Jan 14, 2025
“Passages I Highlighted in The Letters of J.R.R.Tolkien” by Ivan Vendrov
Jan 14, 2025
“Parkinson’s Law and the Ideology of Statistics” by Benquo
Jan 13, 2025
“Capital Ownership Will Not Prevent Human Disempowerment” by beren
Jan 11, 2025
“Activation space interpretability may be doomed” by bilalchughtai, Lucius Bushnaq
Jan 10, 2025
“What o3 Becomes by 2028” by Vladimir_Nesov
Jan 09, 2025
“What Indicators Should We Watch to Disambiguate AGI Timelines?” by snewman
Jan 09, 2025
“How will we update about scheming?” by ryan_greenblatt
Jan 08, 2025
“OpenAI #10: Reflections” by Zvi
Jan 08, 2025
“Maximizing Communication, not Traffic” by jefftk
Jan 07, 2025
“What’s the short timeline plan?” by Marius Hobbhahn
Jan 02, 2025
“Shallow review of technical AI safety, 2024” by technicalities, Stag, Stephen McAleese, jordine, Dr. David Mathers
Dec 30, 2024
“By default, capital will matter more than ever after AGI” by L Rudolf L
Dec 29, 2024
“Review: Planecrash” by L Rudolf L
Dec 28, 2024
“The Field of AI Alignment: A Postmortem, and What To Do About It” by johnswentworth
Dec 26, 2024
“When Is Insurance Worth It?” by kqr
Dec 23, 2024
“Orienting to 3 year AGI timelines” by Nikola Jurkovic
Dec 23, 2024
“What Goes Without Saying” by sarahconstantin
Dec 21, 2024
“o3” by Zach Stein-Perlman
Dec 21, 2024
“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit
Dec 21, 2024
“AIs Will Increasingly Attempt Shenanigans” by Zvi
Dec 19, 2024
“Alignment Faking in Large Language Models” by ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck
Dec 18, 2024
“Communications in Hard Mode (My new job at MIRI)” by tanagrabeast
Dec 15, 2024
“Biological risk from the mirror world” by jasoncrawford
Dec 13, 2024
“Subskills of ‘Listening to Wisdom’” by Raemon
Dec 13, 2024
“Understanding Shapley Values with Venn Diagrams” by Carson L
Dec 13, 2024
“LessWrong audio: help us choose the new voice” by PeterH
Dec 12, 2024
“Understanding Shapley Values with Venn Diagrams” by agucova
Dec 11, 2024
“o1: A Technical Primer” by Jesse Hoogland
Dec 11, 2024
“Gradient Routing: Masking Gradients to Localize Computation in Neural Networks” by cloud, Jacob G-W, Evzen, Joseph Miller, TurnTrout
Dec 09, 2024
“Frontier Models are Capable of In-context Scheming” by Marius Hobbhahn, AlexMeinke, Bronson Schoen
Dec 06, 2024
“(The) Lightcone is nothing without its people: LW + Lighthaven’s first big fundraiser” by habryka
Nov 30, 2024
“Repeal the Jones Act of 1920” by Zvi
Nov 29, 2024
“China Hawks are Manufacturing an AI Arms Race” by garrison
Nov 29, 2024
“Information vs Assurance” by johnswentworth
Nov 27, 2024
“You are not too ‘irrational’ to know your preferences.” by DaystarEld
Nov 27, 2024
“‘The Solomonoff Prior is Malign’ is a special case of a simpler argument” by David Matolcsi
Nov 25, 2024
“‘It’s a 10% chance which I did 10 times, so it should be 100%’” by egor.timatkov
Nov 20, 2024
“OpenAI Email Archives” by habryka
Nov 19, 2024
“Ayn Rand’s model of ‘living money’; and an upside of burnout” by AnnaSalamon
Nov 18, 2024
“Neutrality” by sarahconstantin
Nov 17, 2024
“Making a conservative case for alignment” by Cameron Berg, Judd Rosenblatt, phgubbins, AE Studio
Nov 16, 2024
“OpenAI Email Archives (from Musk v. Altman)” by habryka
Nov 16, 2024
“Catastrophic sabotage as a major threat model for human-level AI systems” by evhub
Nov 15, 2024
“The Online Sports Gambling Experiment Has Failed” by Zvi
Nov 12, 2024
“o1 is a bad idea” by abramdemski
Nov 12, 2024
“Current safety training techniques do not fully transfer to the agent setting” by Simon Lermen, Govind Pimpale
Nov 09, 2024
“Explore More: A Bag of Tricks to Keep Your Life on the Rails” by Shoshannah Tekofsky
Nov 04, 2024
“Survival without dignity” by L Rudolf L
Nov 04, 2024
“The Median Researcher Problem” by johnswentworth
Nov 04, 2024
“The Compendium, A full argument about extinction risk from AGI” by adamShimi, Gabriel Alfour, Connor Leahy, Chris Scammell, Andrea_Miotti
Nov 01, 2024
“What TMS is like” by Sable
Oct 31, 2024
“The hostile telepaths problem” by Valentine
Oct 28, 2024
“A bird’s eye view of ARC’s research” by Jacob_Hilton
Oct 27, 2024
“A Rocket–Interpretability Analogy” by plex
Oct 25, 2024
“I got dysentery so you don’t have to” by eukaryote
Oct 24, 2024
“Overcoming Bias Anthology” by Arjun Panickssery
Oct 23, 2024
“Arithmetic is an underrated world-modeling technology” by dynomight
Oct 22, 2024
“My theory of change for working in AI healthtech” by Andrew_Critch
Oct 15, 2024
“Why I’m not a Bayesian” by Richard_Ngo
Oct 15, 2024
“The AGI Entente Delusion” by Max Tegmark
Oct 14, 2024
“Momentum of Light in Glass” by Ben
Oct 14, 2024
“Overview of strong human intelligence amplification methods” by TsviBT
Oct 09, 2024
“Struggling like a Shadowmoth” by Raemon
Oct 03, 2024
“Three Subtle Examples of Data Leakage” by abstractapplic
Oct 03, 2024
“the case for CoT unfaithfulness is overstated” by nostalgebraist
Sep 30, 2024
“Cryonics is free” by Mati_Roy
Sep 30, 2024
“Stanislav Petrov Quarterly Performance Review” by Ricki Heicklen
Sep 29, 2024
“Laziness death spirals” by PatrickDFarley
Sep 29, 2024
“‘Slow’ takeoff is a terrible term for ‘maybe even faster takeoff, actually’” by Raemon
Sep 29, 2024
“ASIs will not leave just a little sunlight for Earth ” by Eliezer Yudkowsky
Sep 23, 2024
“Skills from a year of Purposeful Rationality Practice ” by Raemon
Sep 21, 2024
“How I started believing religion might actually matter for rationality and moral philosophy ” by zhukeepa
Sep 19, 2024
“Did Christopher Hitchens change his mind about waterboarding? ” by Isaac King
Sep 17, 2024
“The Great Data Integration Schlep ” by sarahconstantin
Sep 15, 2024
“Contra papers claiming superhuman AI forecasting ” by nikos, Peter Mühlbacher, Lawrence Phillips, dschwarz
Sep 14, 2024
“OpenAI o1 ” by Zach Stein-Perlman
Sep 13, 2024
“The Best Lay Argument is not a Simple English Yud Essay ” by J Bostock
Sep 11, 2024
“My Number 1 Epistemology Book Recommendation: Inventing Temperature ” by adamShimi
Sep 10, 2024
“That Alien Message - The Animation ” by Writer
Sep 09, 2024
“Pay Risk Evaluators in Cash, Not Equity ” by Adam Scholl
Sep 07, 2024
“Survey: How Do Elite Chinese Students Feel About the Risks of AI? ” by Nick Corvino
Sep 07, 2024
“things that confuse me about the current AI market. ” by DMMF
Sep 02, 2024
“Nursing doubts ” by dynomight
Sep 01, 2024
“Principles for the AGI Race ” by William_S
Aug 31, 2024
“The Information: OpenAI shows ‘Strawberry’ to feds, races to launch it ” by Martín Soto
Aug 29, 2024
“What is it to solve the alignment problem? ” by Joe Carlsmith
Aug 28, 2024
“Limitations on Formal Verification for AI Safety ” by Andrew Dickson
Aug 27, 2024
“Would catching your AIs trying to escape convince AI developers to slow down or undeploy? ” by Buck
Aug 27, 2024
“Liability regimes for AI ” by Ege Erdil
Aug 23, 2024
“AGI Safety and Alignment at Google DeepMind:A Summary of Recent Work ” by Rohin Shah, Seb Farquhar, Anca Dragan
Aug 21, 2024
“Fields that I reference when thinking about AI takeover prevention” by Buck
Aug 15, 2024
“WTH is Cerebrolysin, actually?” by gsfitzgerald, delton137
Aug 13, 2024
“You can remove GPT2’s LayerNorm by fine-tuning for an hour” by StefanHex
Aug 10, 2024
“Leaving MIRI, Seeking Funding” by abramdemski
Aug 09, 2024
“How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage” by orthonormal
Aug 08, 2024
“This is already your second chance” by Malmesbury
Aug 07, 2024
“0. CAST: Corrigibility as Singular Target” by Max Harms
Aug 07, 2024
“Self-Other Overlap: A Neglected Approach to AI Alignment” by Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena
Aug 07, 2024
“You don’t know how bad most things are nor precisely how they’re bad.” by Solenoid_Entity
Aug 07, 2024
“Recommendation: reports on the search for missing hiker Bill Ewasko” by eukaryote
Aug 07, 2024
“The ‘strong’ feature hypothesis could be wrong” by lsgos
Aug 07, 2024
“‘AI achieves silver-medal standard solving International Mathematical Olympiad problems’” by gjm
Jul 30, 2024
“Decomposing Agency — capabilities without desires” by owencb, Raymond D
Jul 29, 2024
“Universal Basic Income and Poverty” by Eliezer Yudkowsky
Jul 27, 2024
“Optimistic Assumptions, Longterm Planning, and ‘Cope’” by Raemon
Jul 19, 2024
“Superbabies: Putting The Pieces Together” by sarahconstantin
Jul 15, 2024
“Poker is a bad game for teaching epistemics. Figgie is a better one.” by rossry
Jul 12, 2024
“Reliable Sources: The Story of David Gerard” by TracingWoodgrains
Jul 11, 2024
“When is a mind me?” by Rob Bensinger
Jul 08, 2024
“80,000 hours should remove OpenAI from the Job Board (and similar orgs should do similarly)” by Raemon
Jul 04, 2024
[Linkpost] “introduction to cancer vaccines” by bhauth
Jul 02, 2024
“Priors and Prejudice” by MathiasKB
Jul 02, 2024
“My experience using financial commitments to overcome akrasia” by William Howard
Jul 02, 2024
“The Incredible Fentanyl-Detecting Machine” by sarahconstantin
Jul 01, 2024
“AI catastrophes and rogue deployments” by Buck
Jul 01, 2024
“Loving a world you don’t trust” by Joe Carlsmith
Jul 01, 2024
“Formal verification, heuristic explanations and surprise accounting” by paulfchristiano
Jun 27, 2024
“LLM Generality is a Timeline Crux” by eggsyntax
Jun 25, 2024
“SAE feature geometry is outside the superposition hypothesis” by jake_mendel
Jun 25, 2024
“Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data” by Johannes Treutlein, Owain_Evans
Jun 23, 2024
“Boycott OpenAI” by PeterMcCluskey
Jun 21, 2024
“Sycophancy to subterfuge: Investigating reward tampering in large language models” by evhub, Carson Denison
Jun 20, 2024
“I would have shit in that alley, too” by Declan Molony
Jun 18, 2024
“Getting 50% (SoTA) on ARC-AGI with GPT-4o” by ryan_greenblatt
Jun 18, 2024
“Why I don’t believe in the placebo effect” by transhumanist_atom_understander
Jun 15, 2024
“Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)” by Andrew_Critch
Jun 14, 2024
“My AI Model Delta Compared To Christiano” by johnswentworth
Jun 13, 2024
“My AI Model Delta Compared To Yudkowsky” by johnswentworth
Jun 10, 2024
“Response to Aschenbrenner’s ‘Situational Awareness’” by Rob Bensinger
Jun 07, 2024
“Humming is not a free $100 bill” by Elizabeth
Jun 07, 2024
“Announcing ILIAD — Theoretical AI Alignment Conference ” by Nora_Ammann, Alexander Gietelink Oldenziel
Jun 06, 2024
“Non-Disparagement Canaries for OpenAI” by aysja, Adam Scholl
May 31, 2024
“MIRI 2024 Communications Strategy” by Gretta Duleba
May 30, 2024
“OpenAI: Fallout” by Zvi
May 28, 2024
[HUMAN VOICE] Update on human narration for this podcast
May 28, 2024
“Maybe Anthropic’s Long-Term Benefit Trust is powerless” by Zach Stein-Perlman
May 28, 2024
“Notifications Received in 30 Minutes of Class” by tanagrabeast
May 27, 2024
“AI companies aren’t really using external evaluators” by Zach Stein-Perlman
May 24, 2024
“EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024” by scasper
May 24, 2024
“What’s Going on With OpenAI’s Messaging?” by ozziegoen
May 22, 2024
“Language Models Model Us” by eggsyntax
May 21, 2024
Jaan Tallinn’s 2023 Philanthropy Overview
May 21, 2024
“OpenAI: Exodus” by Zvi
May 21, 2024
DeepMind’s ”​​Frontier Safety Framework” is weak and unambitious
May 20, 2024
Do you believe in hundred dollar bills lying on the ground? Consider humming
May 18, 2024
Deep Honesty
May 12, 2024
On Not Pulling The Ladder Up Behind You
May 02, 2024
Mechanistically Eliciting Latent Behaviors in Language Models
May 02, 2024
Ironing Out the Squiggles
May 01, 2024
Introducing AI Lab Watch
May 01, 2024
Refusal in LLMs is mediated by a single direction
Apr 28, 2024
Funny Anecdote of Eliezer From His Sister
Apr 24, 2024
Thoughts on seed oil
Apr 21, 2024
Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
Apr 19, 2024
Express interest in an “FHI of the West”
Apr 18, 2024
Transformers Represent Belief State Geometry in their Residual Stream
Apr 17, 2024
Paul Christiano named as US AI Safety Institute Head of AI Safety
Apr 16, 2024
[HUMAN VOICE] "How could I have thought that faster?" by mesaoptimizer
Apr 12, 2024
[HUMAN VOICE] "My PhD thesis: Algorithmic Bayesian Epistemology" by Eric Neyman
Apr 12, 2024
[HUMAN VOICE] "Toward a Broader Conception of Adverse Selection" by Ricki Heicklen
Apr 12, 2024
[HUMAN VOICE] "On green" by Joe Carlsmith
Apr 12, 2024
LLMs for Alignment Research: a safety priority?
Apr 06, 2024
[HUMAN VOICE] "Social status part 1/2: negotiations over object-level preferences" by Steven Byrnes
Apr 05, 2024
[HUMAN VOICE] "Using axis lines for good or evil" by dynomight
Apr 05, 2024
[HUMAN VOICE] "Scale Was All We Needed, At First" by Gabriel Mukobi
Apr 05, 2024
[HUMAN VOICE] "Acting Wholesomely" by OwenCB
Apr 05, 2024
The Story of “I Have Been A Good Bing”
Apr 01, 2024
The Best Tacit Knowledge Videos on Every Subject
Apr 01, 2024
[HUMAN VOICE] "My Clients, The Liars" by ymeskhout
Mar 20, 2024
[HUMAN VOICE] "Deep atheism and AI risk" by Joe Carlsmith
Mar 20, 2024
[HUMAN VOICE] "CFAR Takeaways: Andrew Critch" by Raemon
Mar 10, 2024
[HUMAN VOICE] "Speaking to Congressional staffers about AI risk" by Akash, hath
Mar 10, 2024
Many arguments for AI x-risk are wrong
Mar 09, 2024
Tips for Empirical Alignment Research
Mar 07, 2024
Timaeus’s First Four Months
Feb 29, 2024
Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Feb 23, 2024
[HUMAN VOICE] "And All the Shoggoths Merely Players" by Zack_M_Davis
Feb 20, 2024
[HUMAN VOICE] "Updatelessness doesn't solve most problems" by Martín Soto
Feb 20, 2024
Every “Every Bay Area House Party” Bay Area House Party
Feb 19, 2024
2023 Survey Results
Feb 19, 2024
Raising children on the eve of AI
Feb 18, 2024
“No-one in my org puts money in their pension”
Feb 18, 2024
Masterpiece
Feb 16, 2024
CFAR Takeaways: Andrew Critch
Feb 15, 2024
[HUMAN VOICE] "Believing In" by Anna Salamon
Feb 14, 2024
[HUMAN VOICE] "Attitudes about Applied Rationality" by Camille Berger
Feb 14, 2024
Scale Was All We Needed, At First
Feb 14, 2024
Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
Feb 11, 2024
[HUMAN VOICE] "A Shutdown Problem Proposal" by johnswentworth, David Lorell
Feb 09, 2024
Brute Force Manufactured Consensus is Hiding the Crime of the Century
Feb 04, 2024
[HUMAN VOICE] "Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI" by Jeremy Gillen, peterbarnett
Feb 03, 2024
Leading The Parade
Feb 02, 2024
[HUMAN VOICE] "The case for ensuring that powerful AIs are controlled" by ryan_greenblatt, Buck
Feb 02, 2024
Processor clock speeds are not how fast AIs think
Feb 01, 2024
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jan 31, 2024
Making every researcher seek grants is a broken model
Jan 29, 2024
The case for training frontier AIs on Sumerian-only corpus
Jan 28, 2024
This might be the last AI Safety Camp
Jan 25, 2024
[HUMAN VOICE] "There is way too much serendipity" by Malmesbury
Jan 22, 2024
[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka
Jan 20, 2024
[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al
Jan 20, 2024
The impossible problem of due process
Jan 17, 2024
[HUMAN VOICE] "Gentleness and the artificial Other" by Joe Carlsmith
Jan 14, 2024
Introducing Alignment Stress-Testing at Anthropic
Jan 14, 2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Jan 13, 2024
[HUMAN VOICE] "Meaning & Agency" by Abram Demski
Jan 07, 2024
What’s up with LLMs representing XORs of arbitrary features?
Jan 07, 2024
Gentleness and the artificial Other
Jan 05, 2024
MIRI 2024 Mission and Strategy Update
Jan 05, 2024
The Plan - 2023 Version
Jan 04, 2024
Apologizing is a Core Rationalist Skill
Jan 03, 2024
[HUMAN VOICE] "A case for AI alignment being difficult" by jessicata
Jan 02, 2024
The Dark Arts
Jan 01, 2024
Critical review of Christiano’s disagreements with Yudkowsky
Dec 28, 2023
Most People Don’t Realize We Have No Idea How Our AIs Work
Dec 27, 2023
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Dec 26, 2023
Succession
Dec 24, 2023
Nonlinear’s Evidence: Debunking False and Misleading Claims
Dec 21, 2023
Effective Aspersions: How the Nonlinear Investigation Went Wrong
Dec 20, 2023
Constellations are Younger than Continents
Dec 20, 2023
The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda
Dec 19, 2023
“Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity
Dec 18, 2023
Is being sexy for your homies?
Dec 17, 2023
[HUMAN VOICE] "Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible" by Gene Smith and Kman
Dec 17, 2023
[HUMAN VOICE] "Moral Reality Check (a short story)" by jessicata
Dec 15, 2023
AI Control: Improving Safety Despite Intentional Subversion
Dec 15, 2023
2023 Unofficial LessWrong Census/Survey
Dec 13, 2023
The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
Dec 13, 2023
[HUMAN VOICE] "What are the results of more parental supervision and less outdoor play?" by Julia Wise
Dec 13, 2023
Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
Dec 12, 2023
re: Yudkowsky on biological materials
Dec 11, 2023
Speaking to Congressional staffers about AI risk
Dec 05, 2023
[HUMAN VOICE] "Shallow review of live agendas in alignment & safety" by technicalities & Stag
Dec 04, 2023
Thoughts on “AI is easy to control” by Pope & Belrose
Dec 02, 2023
The 101 Space You Will Always Have With You
Nov 30, 2023
[HUMAN VOICE] "Social Dark Matter" by Duncan Sabien
Nov 28, 2023
Shallow review of live agendas in alignment & safety
Nov 28, 2023
Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
Nov 25, 2023
[HUMAN VOICE] "The 6D effect: When companies take risks, one email can be very powerful." by scasper
Nov 23, 2023
OpenAI: The Battle of the Board
Nov 22, 2023
OpenAI: Facts from a Weekend
Nov 20, 2023
Sam Altman fired from OpenAI
Nov 18, 2023
Social Dark Matter
Nov 17, 2023
"You can just spontaneously call people you haven't met in years" by lc
Nov 17, 2023
[HUMAN VOICE] "Thinking By The Clock" by Screwtape
Nov 17, 2023
"EA orgs' legal structure inhibits risk taking and information sharing on the margin" by Elizabeth
Nov 17, 2023
[HUMAN VOICE] "AI Timelines" by habryka, Daniel Kokotajlo, Ajeya Cotra, Ege Erdil
Nov 17, 2023
"Integrity in AI Governance and Advocacy" by habryka, Olivia Jimenez
Nov 17, 2023
Loudly Give Up, Don’t Quietly Fade
Nov 16, 2023
[HUMAN VOICE] "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning" by Zac Hatfield-Dodds
Nov 09, 2023
[HUMAN VOICE] "Deception Chess: Game #1" by Zane et al.
Nov 09, 2023
"Does davidad's uploading moonshot work?" by jacobjabob et al.
Nov 09, 2023
"The other side of the tidal wave" by Katja Grace
Nov 09, 2023
"The 6D effect: When companies take risks, one email can be very powerful." by scasper
Nov 09, 2023
Comp Sci in 2027 (Short story by Eliezer Yudkowsky)
Nov 09, 2023
"My thoughts on the social response to AI risk" by Matthew Barnett
Nov 09, 2023
"Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk" by 1a3orn
Nov 09, 2023
"President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence" by Tristan Williams
Nov 03, 2023
"Thoughts on the AI Safety Summit company policy requests and responses" by So8res
Nov 03, 2023
[Human Voice] "Book Review: Going Infinite" by Zvi
Oct 31, 2023
"Announcing Timaeus" by Jesse Hoogland et al.
Oct 30, 2023
"Thoughts on responsible scaling policies and regulation" by Paul Christiano
Oct 30, 2023
"AI as a science, and three obstacles to alignment strategies" by Nate Soares
Oct 30, 2023
"Architects of Our Own Demise: We Should Stop Developing AI" by Roko
Oct 30, 2023
"At 87, Pearl is still able to change his mind" by rotatingpaguro
Oct 30, 2023
"We're Not Ready: thoughts on "pausing" and responsible scaling policies" by Holden Karnofsky
Oct 30, 2023
[HUMAN VOICE] "Alignment Implications of LLM Successes: a Debate in One Act" by Zack M Davis
Oct 23, 2023
"LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B" by Simon Lermen & Jeffrey Ladish.
Oct 23, 2023
"Holly Elmore and Rob Miles dialogue on AI Safety Advocacy" by jacobjacob, Robert Miles & Holly_Elmore
Oct 23, 2023
"Labs should be explicit about why they are building AGI" by Peter Barnett
Oct 19, 2023
[HUMAN VOICE] "Sum-threshold attacks" by TsviBT
Oct 18, 2023
"Will no one rid me of this turbulent pest?" by Metacelsus
Oct 18, 2023
"RSPs are pauses done right" by evhub
Oct 15, 2023
[HUMAN VOICE] "Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth
Oct 15, 2023
"Cohabitive Games so Far" by mako yass
Oct 15, 2023
"Announcing MIRI’s new CEO and leadership team" by Gretta Duleba
Oct 15, 2023
"Comparing Anthropic's Dictionary Learning to Ours" by Robert_AIZI
Oct 15, 2023
"Announcing Dialogues" by Ben Pace
Oct 09, 2023
"Towards Monosemanticity: Decomposing Language Models With Dictionary Learning" by Zac Hatfield-Dodds
Oct 09, 2023
"Evaluating the historical value misspecification argument" by Matthew Barnett
Oct 09, 2023
"Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn" by Zvi
Oct 09, 2023
"Thomas Kwa's MIRI research experience" by Thomas Kwa and others
Oct 06, 2023
"EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem" by Elizabeth
Oct 03, 2023
"How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" by Jan Brauner et al.
Oct 03, 2023
"The Lighthaven Campus is open for bookings" by Habryka
Oct 03, 2023
"'Diamondoid bacteria' nanobots: deadly threat or dead-end? A nanotech investigation" by titotal
Oct 03, 2023
"The King and the Golem" by Richard Ngo
Sep 29, 2023
"Sparse Autoencoders Find Highly Interpretable Directions in Language Models" by Logan Riggs et al
Sep 27, 2023
"Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth
Sep 26, 2023
"There should be more AI safety orgs" by Marius Hobbhahn
Sep 25, 2023
"The Talk: a brief explanation of sexual dimorphism" by Malmesbury
Sep 22, 2023
"A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX" by jacobjacob
Sep 20, 2023
"AI presidents discuss AI alignment agendas" by TurnTrout & Garrett Baker
Sep 19, 2023
"UDT shows that decision theory is more puzzling than ever" by Wei Dai
Sep 18, 2023
"Sum-threshold attacks" by TsviBT
Sep 11, 2023
"A list of core AI safety problems and how I hope to solve them" by Davidad
Sep 09, 2023
"Report on Frontier Model Training" by Yafah Edelman
Sep 09, 2023
"Defunding My Mistake" by ymeskhout
Sep 08, 2023
"Sharing Information About Nonlinear" by Ben Pace
Sep 08, 2023
"One Minute Every Moment" by abramdemski
Sep 08, 2023
"What I would do if I wasn’t at ARC Evals" by LawrenceC
Sep 08, 2023
"The U.S. is becoming less stable" by lc
Sep 04, 2023
"Meta Questions about Metaphilosophy" by Wei Dai
Sep 04, 2023
"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist
Sep 04, 2023
"Dear Self; we need to talk about ambition" by Elizabeth
Aug 30, 2023
"Book Launch: "The Carving of Reality," Best of LessWrong vol. III" by Raemon
Aug 28, 2023
"Assume Bad Faith" by Zack_M_Davis
Aug 28, 2023
"Large Language Models will be Great for Censorship" by Ethan Edwards
Aug 23, 2023
"Ten Thousand Years of Solitude" by agp
Aug 22, 2023
"6 non-obvious mental health issues specific to AI safety" by Igor Ivanov
Aug 22, 2023
"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël
Aug 21, 2023
"Inflection.ai is a major AGI lab" by Nikola
Aug 15, 2023
"Feedbackloop-first Rationality" by Raemon
Aug 15, 2023
"When can we trust model evaluations?" bu evhub
Aug 09, 2023
"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez
Aug 09, 2023
"My current LK99 questions" by Eliezer Yudkowsky
Aug 04, 2023
"The "public debate" about AI is confusing for the general public and for policymakers because it is a three-sided debate" by Adam David Long
Aug 04, 2023
"ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks" by Beth Barnes
Aug 04, 2023
"Thoughts on sharing information about language model capabilities" by paulfchristiano
Aug 02, 2023
"Yes, It's Subjective, But Why All The Crabs?" by johnswentworth
Jul 31, 2023
"Self-driving car bets" by paulfchristiano
Jul 31, 2023
"Cultivating a state of mind where new ideas are born" by Henrik Karlsson
Jul 31, 2023
"Rationality !== Winning" by Raemon
Jul 28, 2023
"Brain Efficiency Cannell Prize Contest Award Ceremony" by Alexander Gietelink Oldenziel
Jul 28, 2023
"Grant applications and grand narratives" by Elizabeth
Jul 28, 2023
"Cryonics and Regret" by MvB
Jul 28, 2023
"Unifying Bargaining Notions (2/2)" by Diffractor
Jun 12, 2023
"The ants and the grasshopper" by Richard Ngo
Jun 06, 2023
"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.
May 18, 2023
"An artificially structured argument for expecting AGI ruin" by Rob Bensinger
May 16, 2023
"How much do you believe your results?" by Eric Neyman
May 10, 2023
"Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)" by Chris Scammell & DivineMango
Apr 27, 2023
"On AutoGPT" by Zvi
Apr 19, 2023
"GPTs are Predictors, not Imitators" by Eliezer Yudkowsky
Apr 12, 2023
"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky
Apr 05, 2023
"A stylized dialogue on John Wentworth's claims about markets and optimization" by Nate Soares
Apr 05, 2023
"Deep Deceptiveness" by Nate Soares
Apr 05, 2023
"The Onion Test for Personal and Institutional Honesty" by Chana Messinger & Andrew Critch
Mar 28, 2023
"There’s no such thing as a tree (phylogenetically)" by Eukaryote
Mar 28, 2023
"Losing the root for the tree" by Adam Zerner
Mar 28, 2023
"Lies, Damn Lies, and Fabricated Options" by Duncan Sabien
Mar 28, 2023
"Why I think strong general AI is coming soon" by Porby
Mar 28, 2023
"It Looks Like You’re Trying To Take Over The World" by Gwern
Mar 28, 2023
"What failure looks like" by Paul Christiano
Mar 28, 2023
"More information about the dangerous capability evaluations we did with GPT-4 and Claude." by Beth Barnes
Mar 21, 2023
""Carefully Bootstrapped Alignment" is organizationally hard" by Raemon
Mar 21, 2023
"The Parable of the King and the Random Process" by moridinamael
Mar 14, 2023
"Enemies vs Malefactors" by Nate Soares
Mar 14, 2023
"The Waluigi Effect (mega-post)" by Cleo Nardo
Mar 08, 2023
"Acausal normalcy" by Andrew Critch
Mar 06, 2023
"Please don't throw your mind away" by TsviBT
Mar 01, 2023
"Cyborgism" by Nicholas Kees & Janus
Feb 15, 2023
"Childhoods of exceptional people" by Henrik Karlsson
Feb 14, 2023
"What I mean by "alignment is in large part about making cognition aimable at all"" by Nate Soares
Feb 13, 2023
"On not getting contaminated by the wrong obesity ideas" by Natália Coelho Mendonça
Feb 10, 2023
"SolidGoldMagikarp (plus, prompt generation)"
Feb 08, 2023
"Focus on the places where you feel shocked everyone's dropping the ball" by Nate Soares
Feb 03, 2023
"Basics of Rationalist Discourse" by Duncan Sabien
Feb 02, 2023
"Sapir-Whorf for Rationalists" by Duncan Sabien
Jan 31, 2023
"My Model Of EA Burnout" by Logan Strohl
Jan 31, 2023
"The Social Recession: By the Numbers" by Anton Stjepan Cebalo
Jan 25, 2023
"Recursive Middle Manager Hell" by Raemon
Jan 24, 2023
"How 'Discovering Latent Knowledge in Language Models Without Supervision' Fits Into a Broader Alignment Scheme" by Collin
Jan 12, 2023
"Models Don't 'Get Reward'" by Sam Ringer
Jan 12, 2023
"The Feeling of Idea Scarcity" by John Wentworth
Jan 12, 2023
"The next decades might be wild" by Marius Hobbhahn
Dec 21, 2022
"Lessons learned from talking to >100 academics about AI safety" by Marius Hobbhahn
Nov 17, 2022
"How my team at Lightcone sometimes gets stuff done" by jacobjacob
Nov 10, 2022
"Decision theory does not imply that we get to have nice things" by So8res
Nov 08, 2022
"What 2026 looks like" by Daniel Kokotajlo
Nov 07, 2022
Counterarguments to the basic AI x-risk case
Nov 04, 2022
"Introduction to abstract entropy" by Alex Altair
Oct 29, 2022
"Consider your appetite for disagreements" by Adam Zerner
Oct 25, 2022
"My resentful story of becoming a medical miracle" by Elizabeth
Oct 21, 2022
"The Redaction Machine" by Ben
Oct 02, 2022
"Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover" by Ajeya Cotra
Sep 27, 2022
"The shard theory of human values" by Quintin Pope & TurnTrout
Sep 22, 2022
"Two-year update on my personal AI timelines" by Ajeya Cotra
Sep 22, 2022
"You Are Not Measuring What You Think You Are Measuring" by John Wentworth
Sep 21, 2022
"Do bamboos set themselves on fire?" by Malmesbury
Sep 20, 2022
"Toni Kurz and the Insanity of Climbing Mountains" by Gene Smith
Sep 18, 2022
"Deliberate Grieving" by Raemon
Sep 18, 2022
"Survey advice" by Katja Grace
Sep 18, 2022
"Language models seem to be much better than humans at next-token prediction" by Buck, Fabien and LawrenceC
Sep 15, 2022
"Humans are not automatically strategic" by Anna Salamon
Sep 15, 2022
"Local Validity as a Key to Sanity and Civilization" by Eliezer Yudkowsky
Sep 15, 2022
"Toolbox-thinking and Law-thinking" by Eliezer Yudkowsky
Sep 15, 2022
"Moral strategies at different capability levels" by Richard Ngo
Sep 14, 2022
"Worlds Where Iterative Design Fails" by John Wentworth
Sep 11, 2022
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
Sep 11, 2022
"Unifying Bargaining Notions (1/2)" by Diffractor
Sep 09, 2022
'Simulators' by Janus
Sep 05, 2022
"Humans provide an untapped wealth of evidence about alignment" by TurnTrout & Quintin Pope
Aug 08, 2022
"Changing the world through slack & hobbies" by Steven Byrnes
Jul 30, 2022
"«Boundaries», Part 1: a key missing concept from utility theory" by Andrew Critch
Jul 28, 2022
"ITT-passing and civility are good; "charity" is bad; steelmanning is niche" by Rob Bensinger
Jul 24, 2022
"What should you change in response to an "emergency"? And AI risk" by Anna Salamon
Jul 23, 2022
"On how various plans miss the hard bits of the alignment challenge" by Nate Soares
Jul 17, 2022
"Humans are very reliable agents" by Alyssa Vance
Jul 13, 2022
"Looking back on my alignment PhD" by TurnTrout
Jul 08, 2022
"It’s Probably Not Lithium" by Natália Coelho Mendonça
Jul 05, 2022
"What Are You Tracking In Your Head?" by John Wentworth
Jul 02, 2022
"Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment" by elspood
Jun 29, 2022
"Where I agree and disagree with Eliezer" by Paul Christiano
Jun 22, 2022
"Six Dimensions of Operational Adequacy in AGI Projects" by Eliezer Yudkowsky
Jun 21, 2022
"Moses and the Class Struggle" by lsusr
Jun 21, 2022
"Benign Boundary Violations" by Duncan Sabien
Jun 20, 2022
"AGI Ruin: A List of Lethalities" by Eliezer Yudkowsky
Jun 20, 2022