LessWrong (30+ Karma)

By LessWrong

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by LessWrong

Category: Technology

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast
    

Subscribers: 3
Reviews: 0
Episodes: 250

Description

Audio narrations of LessWrong posts.

Episode Date
“How I talk to those above me” by Maxwell Peterson
Mar 30, 2025
“The vision of Bill Thurston” by TsviBT
Mar 29, 2025
“Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle” by Czynski
Mar 29, 2025
“Softmax, Emmett Shear’s new AI startup focused on ‘Organic Alignment’” by Chipmonk
Mar 28, 2025
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit
Mar 28, 2025
“Gemini 2.5 is the New SoTA” by Zvi
Mar 28, 2025
“AI #109: Google Fails Marketing Forever” by Zvi
Mar 28, 2025
“Explaining British Naval Dominance During the Age of Sail” by Arjun Panickssery
Mar 28, 2025
“Tracing the Thoughts of a Large Language Model” by Adam Jermyn
Mar 27, 2025
“Third-wave AI safety needs sociopolitical thinking” by Richard_Ngo
Mar 27, 2025
“Mistral Large 2 (123B) exhibits alignment faking” by Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Cameron Berg, Judd Rosenblatt, Mike Vaiana, AE Studio
Mar 27, 2025
[Linkpost] “Center on Long-Term Risk: Summer Research Fellowship 2025 - Apply Now” by Tristan Cook
Mar 27, 2025
“Avoid the Counterargument Collapse” by marknm
Mar 27, 2025
“Automated Researchers Can Subtly Sandbag” by gasteigerjo, Akbir Khan, Sam Bowman, Vlad Mikulik, Ethan Perez, Fabien Roger
Mar 27, 2025
“Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)” by Neel Nanda, lewis smith, Senthooran Rajamanoharan, Arthur Conmy, Callum McDougall, Tom Lieberum, János Kramár, Rohin Shah
Mar 26, 2025
“Eukaryote Skips Town - Why I’m leaving DC” by eukaryote
Mar 26, 2025
“Conceptual Rounding Errors” by Jan_Kulveit
Mar 26, 2025
“Goodhart Typology via Structure, Function, and Randomness Distributions” by JustinShovelain, Mateusz Bagiński
Mar 26, 2025
[Linkpost] “Latest map of all 40 copyright suits v. AI in U.S.” by Remmelt
Mar 26, 2025
“An overview of areas of control work” by ryan_greenblatt
Mar 26, 2025
“Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?” by Alex Mallen, charlie_griffin, Buck Shlegeris
Mar 26, 2025
“More on Various AI Action Plans” by Zvi
Mar 25, 2025
“On (Not) Feeling the AGI” by Zvi
Mar 25, 2025
“23andMe potentially for sale for $23M” by lemonhope
Mar 25, 2025
“Notes on countermeasures for exploration hacking (aka sandbagging)” by ryan_greenblatt
Mar 25, 2025
“Analyzing long agent transcripts (Docent)” by jsteinhardt
Mar 25, 2025
“An overview of control measures” by ryan_greenblatt
Mar 25, 2025
“Policy for LLM Writing on LessWrong” by jimrandomh
Mar 24, 2025
“AI ‘Deep Research’ Tools Reviewed” by sarahconstantin
Mar 24, 2025
“Recent AI model progress feels mostly like bullshit” by lc
Mar 24, 2025
“Will Jesus Christ return in an election year?” by Eric Neyman
Mar 24, 2025
“We need (a lot) more rogue agent honeypots” by Ozyrus
Mar 24, 2025
“Selective modularity: a research agenda” by cloud, Jacob G-W
Mar 24, 2025
“Solving willpower seems easier than solving aging” by Yair Halberstadt
Mar 23, 2025
“A long list of concrete projects and open problems in evals” by Marius Hobbhahn
Mar 23, 2025
“Reframing AI Safety as a Neverending Institutional Challenge” by scasper
Mar 23, 2025
“They Took MY Job?” by Zvi
Mar 23, 2025
“Do models say what they learn?” by Andy Arditi, marvinli, Joe Benton, Miles Turpin
Mar 22, 2025
“Good Research Takes are Not Sufficient for Good Strategic Takes” by Neel Nanda
Mar 22, 2025
“SHIFT relies on token-level features to de-bias Bias in Bios probes” by Tim Hua
Mar 22, 2025
“Silly Time” by jefftk
Mar 22, 2025
“How I force LLMs to generate correct code” by claudio
Mar 21, 2025
“Towards a scale-free theory of intelligent agency” by Richard_Ngo
Mar 21, 2025
“Intention to Treat” by Alicorn
Mar 20, 2025
“Socially Graceful Degradation” by Screwtape
Mar 20, 2025
“Apply to MATS 8.0!” by Ryan Kidd, K Richards
Mar 20, 2025
“Equations Mean Things” by abstractapplic
Mar 20, 2025
“Prioritizing threats for AI control” by ryan_greenblatt
Mar 19, 2025
“Elite Coordination via the Consensus of Power” by Richard_Ngo
Mar 19, 2025
[Linkpost] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman
Mar 19, 2025
“Going Nova” by Zvi
Mar 19, 2025
“Boots theory and Sybil Ramkin” by philh
Mar 19, 2025
“LessOnline 2025: Early Bird Tickets On Sale” by Ben Pace
Mar 18, 2025
“I changed my mind about orca intelligence” by Towards_Keeperhood
Mar 18, 2025
“Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions” by Stuart_Armstrong, rgorman
Mar 18, 2025
“OpenAI #11: America Action Plan” by Zvi
Mar 18, 2025
“Feedback loops for exercise (VO2Max)” by Elizabeth
Mar 18, 2025
“FrontierMath Score of o3-mini Much Lower Than Claimed” by YafahEdelman
Mar 18, 2025
“Falsified draft: ‘Against Yudkowsky’s evolution analogy for AI x-risk’” by Fiora Sunshine
Mar 18, 2025
“Three Types of Intelligence Explosion” by rosehadshar, Tom Davidson, wdmacaskill
Mar 18, 2025
“Sentinel’s Global Risks Weekly Roundup #11/2025. Trump invokes Alien Enemies Act, Chinese invasion barges deployed in exercise.” by NunoSempere
Mar 18, 2025
“Notable utility-monster-like failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format” by Roland Pihlakas, Sruthi Kuriakose
Mar 17, 2025
“Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations” by Nicholas Goldowsky-Dill, Mikita Balesni, Jérémy Scheurer, Marius Hobbhahn
Mar 17, 2025
“Metacognition Broke My Nail-Biting Habit” by Rafka
Mar 17, 2025
“How I’ve run major projects” by benkuhn
Mar 17, 2025
“Any-Benefit Mindset and Any-Reason Reasoning” by silentbob
Mar 16, 2025
“Announcing EXP: Experimental Summer Workshop on Collective Cognition” by Jan_Kulveit, Anna Gajdova
Mar 16, 2025
“Why White-Box Redteaming Makes Me Feel Weird” by Zygi Straznickas
Mar 16, 2025
“I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?” by shrimpy
Mar 16, 2025
“AI for AI safety” by Joe Carlsmith
Mar 15, 2025
“On MAIM and Superintelligence Strategy” by Zvi
Mar 15, 2025
“Unofficial 2024 LessWrong Survey Results” by Screwtape
Mar 15, 2025
“AI for Epistemics Hackathon” by Austin Chen
Mar 14, 2025
“Habermas Machine” by NicholasKees
Mar 14, 2025
“Interpreting Complexity” by Maxwell Adam
Mar 14, 2025
“Vacuum Decay: Expert Survey Results” by JessRiedel
Mar 14, 2025
“AI #107: The Misplaced Hype Machine” by Zvi
Mar 14, 2025
“Reducing LLM deception at scale with self-other overlap fine-tuning” by Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Mike Vaiana, Cameron Berg
Mar 13, 2025
“Auditing language models for hidden objectives” by Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Akbir Khan, Euan Ong, Christopher Olah, Fabien Roger, Meg, Drake Thomas, Adam Jermyn, Monte M, evhub
Mar 13, 2025
“Intelsat as a Model for International AGI Governance” by rosehadshar, wdmacaskill
Mar 13, 2025
“Don’t over-update on FrontierMath results” by David Matolcsi
Mar 13, 2025
“Anthropic, and taking ‘technical philosophy’ more seriously” by Raemon
Mar 13, 2025
“The Most Forbidden Technique” by Zvi
Mar 12, 2025
“Response to Scott Alexander on Imprisonment” by Zvi
Mar 12, 2025
“HPMOR Anniversary Parties: Coordination, Resources, and Discussion” by Screwtape
Mar 12, 2025
“Paths and waystations in AI safety” by Joe Carlsmith
Mar 12, 2025
“Preparing for the Intelligence Explosion” by fin, wdmacaskill
Mar 11, 2025
“Elon Musk May Be Transitioning to Bipolar Type I” by Cyborg25
Mar 11, 2025
“AI Control May Increase Existential Risk” by Jan_Kulveit
Mar 11, 2025
“Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases” by Fabien Roger
Mar 11, 2025
“We Have No Plan for Preventing Loss of Control in Open Models” by Andrew Dickson
Mar 11, 2025
“Trojan Sky” by Richard_Ngo
Mar 11, 2025
“The Manus Marketing Madness” by Zvi
Mar 11, 2025
“OpenAI: Detecting misbehavior in frontier reasoning models” by Daniel Kokotajlo
Mar 11, 2025
“OpenAI: Detecting misbehavior in frontier reasoning models” by Daniel Kokotajlo
Mar 11, 2025
“Introducing 11 New AI Safety Organizations - Catalyze’s Winter 24/25 London Incubation Program Cohort” by Alexandra Bos
Mar 10, 2025
“Everything I Know About Semantics I Learned From Music Notation” by J Bostock
Mar 10, 2025
“Book Review: Affective Neuroscience” by sarahconstantin
Mar 10, 2025
“Phoenix Rising” by Metacelsus
Mar 09, 2025
“The machine has no mouth and it must scream” by zef
Mar 08, 2025
“Childhood and Education #9: School is Hell” by Zvi
Mar 08, 2025
“AI #106: Not so Fast” by Zvi
Mar 08, 2025
“Lots of brief thoughts on Software Engineering” by Yair Halberstadt
Mar 07, 2025
“Top AI safety newsletters, books, podcasts, etc – new AISafety.com resource” by Bryce Robertson, Søren Elverlin
Mar 07, 2025
“So how well is Claude playing Pokémon?” by Julian Bradshaw
Mar 07, 2025
“What the Headlines Miss About the Latest Decision in the Musk vs. OpenAI Lawsuit” by garrison
Mar 07, 2025
“We should start looking for scheming ‘in the wild’” by Marius Hobbhahn
Mar 06, 2025
“The Hidden Cost of Our Lies to AI” by Nicholas Andresen
Mar 06, 2025
“On Writing #1” by Zvi
Mar 06, 2025
“On the Rationality of Deterring ASI” by Dan H
Mar 05, 2025
“How Much Are LLMs Actually Boosting Real-World Programmer Productivity?” by Thane Ruthenis
Mar 05, 2025
“On OpenAI’s Safety and Alignment Philosophy” by Zvi
Mar 05, 2025
“A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis
Mar 05, 2025
“For scheming, we should first focus on detection and then on prevention” by Marius Hobbhahn
Mar 04, 2025
“The Semi-Rational Wildfirefighter” by P. João
Mar 04, 2025
“The Milton Friedman Model of Policy Change” by JohnofCharleston
Mar 04, 2025
“Could Advanced AI Accelerate the Pace of AI Progress? Interviews with AI Researchers” by Nikola Jurkovic, jleibowich, Tom Davidson
Mar 04, 2025
“What goals will AIs have? A list of hypotheses” by Daniel Kokotajlo
Mar 03, 2025
“Methods for strong human germline engineering” by TsviBT
Mar 03, 2025
“Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard
Mar 03, 2025
“Will LLM agents become the first takeover-capable AGIs?” by Seth Herd
Mar 03, 2025
“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout
Mar 02, 2025
“Maintaining Alignment during RSI as a Feedback Control Problem” by beren
Mar 02, 2025
“Open problems in emergent misalignment” by Jan Betley, Daniel Tan
Mar 01, 2025
“On Emergent Misalignment” by Zvi
Feb 28, 2025
“OpenAI releases ChatGPT 4.5” by Seth Herd
Feb 28, 2025
“How to Corner Liars: A Miasma-Clearing Protocol” by ymeskhout
Feb 28, 2025
“Weirdness Points” by lsusr
Feb 28, 2025
“Why Can’t We Hypothesize After the Fact?” by David Udell
Feb 27, 2025
“Fuzzing LLMs sometimes makes them reveal their secrets” by Fabien Roger
Feb 26, 2025
“[PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations” by Lucy Farnik
Feb 26, 2025
“Osaka” by lsusr
Feb 26, 2025
“Time to Welcome Claude 3.7” by Zvi
Feb 26, 2025
“You can just wear a suit” by lsusr
Feb 26, 2025
“what an efficient market feels from inside” by DMMF
Feb 26, 2025
“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans
Feb 25, 2025
“Grok Grok” by Zvi
Feb 25, 2025
“Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?” by Jesse Richardson, Yoshua Bengio, dwk, mattmacdermott
Feb 25, 2025
“Dream, Truth, & Good” by abramdemski
Feb 25, 2025
“Conference Report: Threshold 2030 - Modeling AI Economic Futures” by Deric Cheng, Justin Bullock, Deger Turan, Elliot Mckernon
Feb 25, 2025
“Training AI to do alignment research we don’t already know how to do” by joshc
Feb 24, 2025
“Anthropic releases Claude 3.7 Sonnet with extended thinking mode” by LawrenceC
Feb 24, 2025
“Forecasting Frontier Language Model Agent Capabilities” by Govind Pimpale, Axel Højmark, Jérémy Scheurer, Marius Hobbhahn
Feb 24, 2025
“Evaluating ‘What 2026 Looks Like’ So Far” by Jonny Spicer
Feb 24, 2025
“Export Surplusses” by lsusr
Feb 24, 2025
“Judgements: Merging Prediction & Evidence” by abramdemski
Feb 24, 2025
“The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research” by Arthur Conmy, Neel Nanda
Feb 24, 2025
“Power Lies Trembling: a three-book review” by Richard_Ngo
Feb 23, 2025
“Proselytizing” by lsusr
Feb 23, 2025
“HPMOR Anniversary Guide” by Screwtape
Feb 23, 2025
“Alignment can be the ‘clean energy’ of AI” by Cameron Berg, Judd Rosenblatt, AE Studio
Feb 22, 2025
“ParaScope: Do Language Models Plan the Upcoming Paragraph?” by NickyP
Feb 21, 2025
“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis
Feb 21, 2025
“On OpenAI’s Model Spec 2.0” by Zvi
Feb 21, 2025
“The first RCT for GLP-1 drugs and alcoholism isn’t what we hoped” by dynomight
Feb 21, 2025
“AI #104: American State Capacity on the Brink” by Zvi
Feb 21, 2025
“Timaeus in 2024” by Jesse Hoogland, Stan van Wingerden, Alexander Gietelink Oldenziel, Daniel Murfet
Feb 21, 2025
“Eliezer’s Lost Alignment Articles / The Arbital Sequence” by Ruby
Feb 20, 2025
“Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby
Feb 20, 2025
“Go Grok Yourself” by Zvi
Feb 20, 2025
“How accurate was my ‘Altered Traits’ book review?” by lsusr
Feb 19, 2025
“SuperBabies podcast with Gene Smith” by Eneasz
Feb 19, 2025
“How might we safely pass the buck to AI?” by joshc
Feb 19, 2025
“How to Make Superbabies” by GeneSmith, kman
Feb 19, 2025
“Dear AGI,” by Nathan Young
Feb 18, 2025
“Do models know when they are being evaluated?” by Govind Pimpale
Feb 18, 2025
“AGI Safety & Alignment @ Google DeepMind is hiring” by Rohin Shah
Feb 17, 2025
“A History of the Future, 2025-2040” by L Rudolf L
Feb 17, 2025
“Thermodynamic entropy = Kolmogorov complexity” by EbTech
Feb 17, 2025
“Celtic Knots on Einstein Lattice” by Ben
Feb 17, 2025
“Gauging Interest for a Learning-Theoretic Agenda Mentorship Programme” by Vanessa Kosoy
Feb 16, 2025
“It’s been ten years. I propose HPMOR Anniversary Parties.” by Screwtape
Feb 16, 2025
“Microplastics: Much Less Than You Wanted To Know” by jenn, kaleb, Brent
Feb 16, 2025
“A computational no-coincidence principle” by Eric Neyman
Feb 14, 2025
“A short course on AGI safety from the GDM Alignment team” by Vika, Rohin Shah
Feb 14, 2025
“The Mask Comes Off: A Trio of Tales” by Zvi
Feb 14, 2025
“Ambiguous out-of-distribution generalization on an algorithmic task” by Wilson Wu, Experience Machine
Feb 14, 2025
“≤10-year Timelines Remain Unlikely Despite DeepSeek and o3” by Rafael Harth
Feb 14, 2025
“Self-dialogue: Do behaviorist rewards make scheming AGIs?” by Steven Byrnes
Feb 14, 2025
“Murder plots are infohazards” by Chris Monteiro
Feb 13, 2025
“How do we solve the alignment problem?” by Joe Carlsmith
Feb 13, 2025
“Extended analogy between humans, corporations, and AIs.” by Daniel Kokotajlo
Feb 13, 2025
“Virtue signaling, and the ‘humans-are-wonderful’ bias, as a trust exercise” by lc
Feb 13, 2025
“My model of what is going on with LLMs” by Cole Wyeth
Feb 13, 2025
“Skepticism towards claims about the views of powerful institutions” by tlevin
Feb 13, 2025
“Why you maybe should lift weights, and How to.” by samusasuke
Feb 13, 2025
“Not all capabilities will be created equal: focus on strategically superhuman agents” by benwr
Feb 13, 2025
“The Paris AI Anti-Safety Summit” by Zvi
Feb 12, 2025
“Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs” by Matrice Jacobine
Feb 12, 2025
“Proof idea: SLT to AIT” by Lucius Bushnaq
Feb 11, 2025
“On Deliberative Alignment” by Zvi
Feb 11, 2025
“The News is Never Neglected” by lsusr
Feb 11, 2025
“Nonpartisan AI safety” by Yair Halberstadt
Feb 11, 2025
“Knocking Down My AI Optimist Strawman” by tailcalled
Feb 11, 2025
“Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?” by garrison
Feb 11, 2025
“Levels of Friction” by Zvi
Feb 10, 2025
“Reasons-based choice and cluelessness” by JesseClifton
Feb 10, 2025
“On the Meta and DeepMind Safety Frameworks” by Zvi
Feb 09, 2025
“Gary Marcus now saying AI can’t do things it can already do” by Benjamin_Todd
Feb 09, 2025
“Two hemispheres - I do not think it means what you think it means” by Viliam
Feb 09, 2025
“‘Think it Faster’ worksheet” by Raemon
Feb 09, 2025
“A Problem to Solve Before Building a Deception Detector” by Eleni Angelou, lewis smith
Feb 08, 2025
“Wild Animal Suffering Is The Worst Thing In The World” by omnizoid
Feb 08, 2025
“Research directions Open Phil wants to fund in technical AI safety” by jake_mendel, maxnadeau, Peter Favaloro
Feb 08, 2025
“Racing Towards Fusion and AI” by Jeffrey Heninger
Feb 08, 2025
“So You Want To Make Marginal Progress...” by johnswentworth
Feb 08, 2025
“How AI Takeover Might Happen in 2 Years” by joshc
Feb 07, 2025
“Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas” by jake_mendel, maxnadeau, Peter Favaloro
Feb 06, 2025
“MATS Applications + Research Directions I’m Currently Excited About” by Neel Nanda
Feb 06, 2025
“Detecting Strategic Deception Using Linear Probes” by Nicholas Goldowsky-Dill, bilalchughtai, StefanHex, Marius Hobbhahn
Feb 06, 2025
“Voting Results for the 2023 Review” by Raemon
Feb 06, 2025
“The Risk of Gradual Disempowerment from AI” by Zvi
Feb 06, 2025
“C’mon guys, Deliberate Practice is Real” by Raemon
Feb 06, 2025
“Wired on: ’DOGE personnel with admin access to” by Raemon
Feb 05, 2025
“Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker” by Daniel Herrmann, Aydin Mohseni, ben_levinstein
Feb 05, 2025
“Language Models Use Trigonometry to Do Addition” by Subhash Kantamneni
Feb 05, 2025
“Reviewing LessWrong: Screwtape’s Basic Answer” by Screwtape
Feb 05, 2025
“We’re in Deep Research” by Zvi
Feb 05, 2025
“Meta: Frontier AI Framework” by Zach Stein-Perlman
Feb 05, 2025
“Anti-Slop Interventions?” by abramdemski
Feb 05, 2025
“Tear Down the Burren” by jefftk
Feb 04, 2025
“o3-mini Early Days” by Zvi
Feb 04, 2025
“OpenAI releases deep research agent” by Seth Herd
Feb 03, 2025
“Pick two: concise, comprehensive, or clear rules” by Screwtape
Feb 03, 2025
“2024 was the year of the big battery, and what that means for solar power” by transhumanist_atom_understander
Feb 03, 2025
“Alderaan” by lsusr
Feb 02, 2025
“The Simplest Good” by Jesse Hoogland
Feb 02, 2025
“Gradual Disempowerment, Shell Games and Flinches” by Jan_Kulveit
Feb 02, 2025
“Falsehoods you might believe about people who are at a rationalist meetup” by Screwtape
Feb 02, 2025
“Some articles in ‘International Security’ that I enjoyed” by Buck
Feb 01, 2025
“DeepSeek: Don’t Panic” by Zvi
Feb 01, 2025
“The Failed Strategy of Artificial Intelligence Doomers” by Ben Pace
Jan 31, 2025
“Will alignment-faking Claude accept a deal to reveal its misalignment?” by ryan_greenblatt
Jan 31, 2025
“Catastrophe through Chaos” by Marius Hobbhahn
Jan 31, 2025
“In response to critiques of Guaranteed Safe AI” by Nora_Ammann
Jan 31, 2025
“AI #101: The Shallow End” by Zvi
Jan 31, 2025
“What’s Behind the SynBio Bust?” by sarahconstantin
Jan 31, 2025
“Steering Gemini with BiDPO” by TurnTrout
Jan 31, 2025
“Thread for Sense-Making on Recent Murders and How to Sanely Respond” by Ben Pace
Jan 31, 2025
“You should read Hobbes, Locke, Hume, and Mill via EarlyModernTexts.com” by Arjun Panickssery
Jan 31, 2025
“Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development” by Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet), David Duvenaud
Jan 30, 2025
“A sketch of an AI control safety case” by Tomek Korbak, joshc, Benjamin Hilton, Buck, Geoffrey Irving
Jan 30, 2025
“Anthropic CEO calls for RSI” by Andrea_Miotti
Jan 30, 2025
“Planning for Extreme AI Risks” by joshc
Jan 29, 2025
“Dario Amodei: On DeepSeek and Export Controls” by Zach Stein-Perlman
Jan 29, 2025
“Operator” by Zvi
Jan 29, 2025
“Open Problems in Mechanistic Interpretability” by Lee Sharkey, bilalchughtai
Jan 29, 2025
“Fake thinking and real thinking” by Joe Carlsmith
Jan 29, 2025
“DeepSeek Panic at the App Store” by Zvi
Jan 29, 2025
“The Game Board has been Flipped: Now is a good time to rethink what you’re doing” by Alex Lintz
Jan 29, 2025