The Nonlinear Library: LessWrong Weekly The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org The Nonlinear Fund © 2023 The Nonlinear Fund en-us https://www.nonlinear.org https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png The Nonlinear Fund podcast@nonlinear.org no The Nonlinear Fund Sat, 02 Sep 2023 21:55:25 +0000LNA8mubrByG7SFacm_LW-week LW - Against Almost Every Theory of Impact of Interpretability by Charbel-Raphaël Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Against Almost Every Theory of Impact of Interpretability, published by Charbel-Raphaël on August 17, 2023 on LessWrong. Epistemic Status: I believe I am well-versed in this subject. I erred on the side of making claims that were too strong and allowing readers to disagree and start a discussion about precise points rather than trying to edge-case every statement. I also think that using memes is important because safety ideas are boring and anti-memetic. So let's go! Many thanks to @scasper, @Sid Black , @Neel Nanda , @Fabien Roger , @Bogdan Ionut Cirstea, @WCargo, @Alexandre Variengien, @Jonathan Claybrough, @Edoardo Pona, @Andrea_Miotti, Diego Dorn, Angélina Gentaz, Clement Dumas, and Enzo Marsot for useful feedback and discussions. When I started this post, I began by critiquing the article A Long List of Theories of Impact for Interpretability, from Neel Nanda, but I later expanded the scope of my critique. Some ideas which are presented are not supported by anyone, but to explain the difficulties, I still need to 1. explain them and 2. criticize them. It gives an adversarial vibe to this post. I'm sorry about that, and I think that doing research into interpretability, even if it's no longer what I consider a priority, is still commendable. How to read this document? Most of this document is not technical, except for the section "What does the end story of interpretability look like?" which can be mostly skipped at first. I expect this document to also be useful for people not doing interpretability research. The different sections are mostly independent, and I've added a lot of bookmarks to help modularize this post. If you have very little time, just read (this is also the part where I'm most confident): Auditing deception with Interp is out of reach (4 min) Enumerative safety critique (2 min) Technical Agendas with better Theories of Impact (1 min) Here is the list of claims that I will defend: (bolded sections are the most important ones) The overall Theory of Impact is quite poor Interp is not a good predictor of future systems Auditing deception with interp is out of reach What does the end story of interpretability look like? That's not clear at all. Enumerative safety? Reverse engineering? Olah's Interpretability dream? Retargeting the search? Relaxed adversarial training? Microscope AI? Preventive measures against Deception seem much more workable Steering the world towards transparency Cognitive Emulations - Explainability By design Interpretability May Be Overall Harmful Outside view: The proportion of junior researchers doing Interp rather than other technical work is too high So far my best ToI for interp: Nerd Sniping? Even if we completely solve interp, we are still in danger Technical Agendas with better Theories of Impact Conclusion Note: The purpose of this post is to criticize the Theory of Impact (ToI) of interpretability for deep learning models such as GPT-like models, and not the explainability and interpretability of small models. The emperor has no clothes? I gave a talk about the different risk models, followed by an interpretability presentation, then I got a problematic question, "I don't understand, what's the point of doing this?" Hum. Feature viz? (left image) Um, it's pretty but is this useful? Is this reliable? GradCam (a pixel attribution technique, like on the above right figure), it's pretty. But I've never seen anybody use it in industry. Pixel attribution seems useful, but accuracy remains the king. Induction heads? Ok, we are maybe on track to retro engineer the mechanism of regex in LLMs. Cool. The considerations in the last bullet points are based on feeling and are not real arguments. Furthermore, most mechanistic interpretability isn't even aimed at being useful right now. But in the rest of the post, we'll find out if...]]>
Charbel-Raphaël https://www.lesswrong.com/posts/LNA8mubrByG7SFacm/against-almost-every-theory-of-impact-of-interpretability-1 Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Against Almost Every Theory of Impact of Interpretability, published by Charbel-Raphaël on August 17, 2023 on LessWrong. Epistemic Status: I believe I am well-versed in this subject. I erred on the side of making claims that were too strong and allowing readers to disagree and start a discussion about precise points rather than trying to edge-case every statement. I also think that using memes is important because safety ideas are boring and anti-memetic. So let's go! Many thanks to @scasper, @Sid Black , @Neel Nanda , @Fabien Roger , @Bogdan Ionut Cirstea, @WCargo, @Alexandre Variengien, @Jonathan Claybrough, @Edoardo Pona, @Andrea_Miotti, Diego Dorn, Angélina Gentaz, Clement Dumas, and Enzo Marsot for useful feedback and discussions. When I started this post, I began by critiquing the article A Long List of Theories of Impact for Interpretability, from Neel Nanda, but I later expanded the scope of my critique. Some ideas which are presented are not supported by anyone, but to explain the difficulties, I still need to 1. explain them and 2. criticize them. It gives an adversarial vibe to this post. I'm sorry about that, and I think that doing research into interpretability, even if it's no longer what I consider a priority, is still commendable. How to read this document? Most of this document is not technical, except for the section "What does the end story of interpretability look like?" which can be mostly skipped at first. I expect this document to also be useful for people not doing interpretability research. The different sections are mostly independent, and I've added a lot of bookmarks to help modularize this post. If you have very little time, just read (this is also the part where I'm most confident): Auditing deception with Interp is out of reach (4 min) Enumerative safety critique (2 min) Technical Agendas with better Theories of Impact (1 min) Here is the list of claims that I will defend: (bolded sections are the most important ones) The overall Theory of Impact is quite poor Interp is not a good predictor of future systems Auditing deception with interp is out of reach What does the end story of interpretability look like? That's not clear at all. Enumerative safety? Reverse engineering? Olah's Interpretability dream? Retargeting the search? Relaxed adversarial training? Microscope AI? Preventive measures against Deception seem much more workable Steering the world towards transparency Cognitive Emulations - Explainability By design Interpretability May Be Overall Harmful Outside view: The proportion of junior researchers doing Interp rather than other technical work is too high So far my best ToI for interp: Nerd Sniping? Even if we completely solve interp, we are still in danger Technical Agendas with better Theories of Impact Conclusion Note: The purpose of this post is to criticize the Theory of Impact (ToI) of interpretability for deep learning models such as GPT-like models, and not the explainability and interpretability of small models. The emperor has no clothes? I gave a talk about the different risk models, followed by an interpretability presentation, then I got a problematic question, "I don't understand, what's the point of doing this?" Hum. Feature viz? (left image) Um, it's pretty but is this useful? Is this reliable? GradCam (a pixel attribution technique, like on the above right figure), it's pretty. But I've never seen anybody use it in industry. Pixel attribution seems useful, but accuracy remains the king. Induction heads? Ok, we are maybe on track to retro engineer the mechanism of regex in LLMs. Cool. The considerations in the last bullet points are based on feeling and are not real arguments. Furthermore, most mechanistic interpretability isn't even aimed at being useful right now. But in the rest of the post, we'll find out if...]]>
Thu, 17 Aug 2023 20:31:01 +0000 LW - Against Almost Every Theory of Impact of Interpretability by Charbel-Raphaël Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Against Almost Every Theory of Impact of Interpretability, published by Charbel-Raphaël on August 17, 2023 on LessWrong. Epistemic Status: I believe I am well-versed in this subject. I erred on the side of making claims that were too strong and allowing readers to disagree and start a discussion about precise points rather than trying to edge-case every statement. I also think that using memes is important because safety ideas are boring and anti-memetic. So let's go! Many thanks to @scasper, @Sid Black , @Neel Nanda , @Fabien Roger , @Bogdan Ionut Cirstea, @WCargo, @Alexandre Variengien, @Jonathan Claybrough, @Edoardo Pona, @Andrea_Miotti, Diego Dorn, Angélina Gentaz, Clement Dumas, and Enzo Marsot for useful feedback and discussions. When I started this post, I began by critiquing the article A Long List of Theories of Impact for Interpretability, from Neel Nanda, but I later expanded the scope of my critique. Some ideas which are presented are not supported by anyone, but to explain the difficulties, I still need to 1. explain them and 2. criticize them. It gives an adversarial vibe to this post. I'm sorry about that, and I think that doing research into interpretability, even if it's no longer what I consider a priority, is still commendable. How to read this document? Most of this document is not technical, except for the section "What does the end story of interpretability look like?" which can be mostly skipped at first. I expect this document to also be useful for people not doing interpretability research. The different sections are mostly independent, and I've added a lot of bookmarks to help modularize this post. If you have very little time, just read (this is also the part where I'm most confident): Auditing deception with Interp is out of reach (4 min) Enumerative safety critique (2 min) Technical Agendas with better Theories of Impact (1 min) Here is the list of claims that I will defend: (bolded sections are the most important ones) The overall Theory of Impact is quite poor Interp is not a good predictor of future systems Auditing deception with interp is out of reach What does the end story of interpretability look like? That's not clear at all. Enumerative safety? Reverse engineering? Olah's Interpretability dream? Retargeting the search? Relaxed adversarial training? Microscope AI? Preventive measures against Deception seem much more workable Steering the world towards transparency Cognitive Emulations - Explainability By design Interpretability May Be Overall Harmful Outside view: The proportion of junior researchers doing Interp rather than other technical work is too high So far my best ToI for interp: Nerd Sniping? Even if we completely solve interp, we are still in danger Technical Agendas with better Theories of Impact Conclusion Note: The purpose of this post is to criticize the Theory of Impact (ToI) of interpretability for deep learning models such as GPT-like models, and not the explainability and interpretability of small models. The emperor has no clothes? I gave a talk about the different risk models, followed by an interpretability presentation, then I got a problematic question, "I don't understand, what's the point of doing this?" Hum. Feature viz? (left image) Um, it's pretty but is this useful? Is this reliable? GradCam (a pixel attribution technique, like on the above right figure), it's pretty. But I've never seen anybody use it in industry. Pixel attribution seems useful, but accuracy remains the king. Induction heads? Ok, we are maybe on track to retro engineer the mechanism of regex in LLMs. Cool. The considerations in the last bullet points are based on feeling and are not real arguments. Furthermore, most mechanistic interpretability isn't even aimed at being useful right now. But in the rest of the post, we'll find out if...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Against Almost Every Theory of Impact of Interpretability, published by Charbel-Raphaël on August 17, 2023 on LessWrong. Epistemic Status: I believe I am well-versed in this subject. I erred on the side of making claims that were too strong and allowing readers to disagree and start a discussion about precise points rather than trying to edge-case every statement. I also think that using memes is important because safety ideas are boring and anti-memetic. So let's go! Many thanks to @scasper, @Sid Black , @Neel Nanda , @Fabien Roger , @Bogdan Ionut Cirstea, @WCargo, @Alexandre Variengien, @Jonathan Claybrough, @Edoardo Pona, @Andrea_Miotti, Diego Dorn, Angélina Gentaz, Clement Dumas, and Enzo Marsot for useful feedback and discussions. When I started this post, I began by critiquing the article A Long List of Theories of Impact for Interpretability, from Neel Nanda, but I later expanded the scope of my critique. Some ideas which are presented are not supported by anyone, but to explain the difficulties, I still need to 1. explain them and 2. criticize them. It gives an adversarial vibe to this post. I'm sorry about that, and I think that doing research into interpretability, even if it's no longer what I consider a priority, is still commendable. How to read this document? Most of this document is not technical, except for the section "What does the end story of interpretability look like?" which can be mostly skipped at first. I expect this document to also be useful for people not doing interpretability research. The different sections are mostly independent, and I've added a lot of bookmarks to help modularize this post. If you have very little time, just read (this is also the part where I'm most confident): Auditing deception with Interp is out of reach (4 min) Enumerative safety critique (2 min) Technical Agendas with better Theories of Impact (1 min) Here is the list of claims that I will defend: (bolded sections are the most important ones) The overall Theory of Impact is quite poor Interp is not a good predictor of future systems Auditing deception with interp is out of reach What does the end story of interpretability look like? That's not clear at all. Enumerative safety? Reverse engineering? Olah's Interpretability dream? Retargeting the search? Relaxed adversarial training? Microscope AI? Preventive measures against Deception seem much more workable Steering the world towards transparency Cognitive Emulations - Explainability By design Interpretability May Be Overall Harmful Outside view: The proportion of junior researchers doing Interp rather than other technical work is too high So far my best ToI for interp: Nerd Sniping? Even if we completely solve interp, we are still in danger Technical Agendas with better Theories of Impact Conclusion Note: The purpose of this post is to criticize the Theory of Impact (ToI) of interpretability for deep learning models such as GPT-like models, and not the explainability and interpretability of small models. The emperor has no clothes? I gave a talk about the different risk models, followed by an interpretability presentation, then I got a problematic question, "I don't understand, what's the point of doing this?" Hum. Feature viz? (left image) Um, it's pretty but is this useful? Is this reliable? GradCam (a pixel attribution technique, like on the above right figure), it's pretty. But I've never seen anybody use it in industry. Pixel attribution seems useful, but accuracy remains the king. Induction heads? Ok, we are maybe on track to retro engineer the mechanism of regex in LLMs. Cool. The considerations in the last bullet points are based on feeling and are not real arguments. Furthermore, most mechanistic interpretability isn't even aimed at being useful right now. But in the rest of the post, we'll find out if...]]>
Charbel-Raphaël https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 01:09:12 None full 6874
EzSH9698DhBsXAcYY_LW-week LW - My current LK99 questions by Eliezer Yudkowsky Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My current LK99 questions, published by Eliezer Yudkowsky on August 1, 2023 on LessWrong. So this morning I thought to myself, "Okay, now I will actually try to study the LK99 question, instead of betting based on nontechnical priors and market sentiment reckoning." (My initial entry into the affray, having been driven by people online presenting as confidently YES when the prediction markets were not confidently YES.) And then I thought to myself, "This LK99 issue seems complicated enough that it'd be worth doing an actual Bayesian calculation on it"--a rare thought; I don't think I've done an actual explicit numerical Bayesian update in at least a year. In the process of trying to set up an explicit calculation, I realized I felt very unsure about some critically important quantities, to the point where it no longer seemed worth trying to do the calculation with numbers. This is the System Working As Intended. On July 30th, Danielle Fong said of this temperature-current-voltage graph, 'Normally as current increases, voltage drop across a material increases. in a superconductor, voltage stays nearly constant, 0. that appears to be what's happening here -- up to a critical current. with higher currents available at lower temperatures deeply in the "fraud or superconduct" territory, imo. like you don't get this by accident -- you either faked it, or really found something.' The graph Fong is talking about only appears in the initial paper put forth by Young-Wan Kwon, allegedly without authorization. A different graph, though similar, appears in Fig. 6 on p. 12 of the 6-author LK-endorsed paper rushed out in response. Is it currently widely held by expert opinion, that this diagram has no obvious or likely explanation except "superconductivity" or "fraud"? If the authors discovered something weird that wasn't a superconductor, or if they just hopefully measured over and over until they started getting some sort of measurement error, is there any known, any obvious way they could have gotten the same graph? One person alleges an online rumor that poorly connected electrical leads can produce the same graph. Is that a conventional view? Alternatively: If this material is a superconductor, have we seen what we expected to see? Is the diminishing current capacity with increased temperature usual? How does this alleged direct measurement of superconductivity square up with the current-story-as-I-understood-it that the material is only being very poorly synthesized, probably only in granules or gaps, and hence only detectable by looking for magnetic resistance / pinning? This is my number-one question. Call it question 1-NO, because it's the question of "How does the NO story explain this graph, and how prior-improbable or prior-likely was that story?", with respect to my number one question. Though I'd also like to know the 1-YES details: whether this looks like a high-prior-probability superconductivity graph; or a graph that requires a new kind of superconductivity, but one that's theoretically straightforward given a central story; or if it looks like unspecified weird superconductivity, with there being no known theory that predicts a graph looking roughly like this. What's up with all the partial levitation videos? Possibilities I'm currently tracking: 2-NO-A: There's something called "diamagnetism" which exists in other materials. The videos by LK and attempted replicators show the putative superconductor being repelled from the magnet, but not being locked in space relative to the magnet. Superconductors are supposed to exhibit Meissner pinning, and the failure of the material to be pinned to the magnet indicates that this isn't a superconductor. (Sabine Hossenfelder seems to talk this way here. "I lost hope when I saw this video; this doesn't look like the Meissner ...]]>
Eliezer Yudkowsky https://www.lesswrong.com/posts/EzSH9698DhBsXAcYY/my-current-lk99-questions Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My current LK99 questions, published by Eliezer Yudkowsky on August 1, 2023 on LessWrong. So this morning I thought to myself, "Okay, now I will actually try to study the LK99 question, instead of betting based on nontechnical priors and market sentiment reckoning." (My initial entry into the affray, having been driven by people online presenting as confidently YES when the prediction markets were not confidently YES.) And then I thought to myself, "This LK99 issue seems complicated enough that it'd be worth doing an actual Bayesian calculation on it"--a rare thought; I don't think I've done an actual explicit numerical Bayesian update in at least a year. In the process of trying to set up an explicit calculation, I realized I felt very unsure about some critically important quantities, to the point where it no longer seemed worth trying to do the calculation with numbers. This is the System Working As Intended. On July 30th, Danielle Fong said of this temperature-current-voltage graph, 'Normally as current increases, voltage drop across a material increases. in a superconductor, voltage stays nearly constant, 0. that appears to be what's happening here -- up to a critical current. with higher currents available at lower temperatures deeply in the "fraud or superconduct" territory, imo. like you don't get this by accident -- you either faked it, or really found something.' The graph Fong is talking about only appears in the initial paper put forth by Young-Wan Kwon, allegedly without authorization. A different graph, though similar, appears in Fig. 6 on p. 12 of the 6-author LK-endorsed paper rushed out in response. Is it currently widely held by expert opinion, that this diagram has no obvious or likely explanation except "superconductivity" or "fraud"? If the authors discovered something weird that wasn't a superconductor, or if they just hopefully measured over and over until they started getting some sort of measurement error, is there any known, any obvious way they could have gotten the same graph? One person alleges an online rumor that poorly connected electrical leads can produce the same graph. Is that a conventional view? Alternatively: If this material is a superconductor, have we seen what we expected to see? Is the diminishing current capacity with increased temperature usual? How does this alleged direct measurement of superconductivity square up with the current-story-as-I-understood-it that the material is only being very poorly synthesized, probably only in granules or gaps, and hence only detectable by looking for magnetic resistance / pinning? This is my number-one question. Call it question 1-NO, because it's the question of "How does the NO story explain this graph, and how prior-improbable or prior-likely was that story?", with respect to my number one question. Though I'd also like to know the 1-YES details: whether this looks like a high-prior-probability superconductivity graph; or a graph that requires a new kind of superconductivity, but one that's theoretically straightforward given a central story; or if it looks like unspecified weird superconductivity, with there being no known theory that predicts a graph looking roughly like this. What's up with all the partial levitation videos? Possibilities I'm currently tracking: 2-NO-A: There's something called "diamagnetism" which exists in other materials. The videos by LK and attempted replicators show the putative superconductor being repelled from the magnet, but not being locked in space relative to the magnet. Superconductors are supposed to exhibit Meissner pinning, and the failure of the material to be pinned to the magnet indicates that this isn't a superconductor. (Sabine Hossenfelder seems to talk this way here. "I lost hope when I saw this video; this doesn't look like the Meissner ...]]>
Tue, 01 Aug 2023 23:23:17 +0000 LW - My current LK99 questions by Eliezer Yudkowsky Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My current LK99 questions, published by Eliezer Yudkowsky on August 1, 2023 on LessWrong. So this morning I thought to myself, "Okay, now I will actually try to study the LK99 question, instead of betting based on nontechnical priors and market sentiment reckoning." (My initial entry into the affray, having been driven by people online presenting as confidently YES when the prediction markets were not confidently YES.) And then I thought to myself, "This LK99 issue seems complicated enough that it'd be worth doing an actual Bayesian calculation on it"--a rare thought; I don't think I've done an actual explicit numerical Bayesian update in at least a year. In the process of trying to set up an explicit calculation, I realized I felt very unsure about some critically important quantities, to the point where it no longer seemed worth trying to do the calculation with numbers. This is the System Working As Intended. On July 30th, Danielle Fong said of this temperature-current-voltage graph, 'Normally as current increases, voltage drop across a material increases. in a superconductor, voltage stays nearly constant, 0. that appears to be what's happening here -- up to a critical current. with higher currents available at lower temperatures deeply in the "fraud or superconduct" territory, imo. like you don't get this by accident -- you either faked it, or really found something.' The graph Fong is talking about only appears in the initial paper put forth by Young-Wan Kwon, allegedly without authorization. A different graph, though similar, appears in Fig. 6 on p. 12 of the 6-author LK-endorsed paper rushed out in response. Is it currently widely held by expert opinion, that this diagram has no obvious or likely explanation except "superconductivity" or "fraud"? If the authors discovered something weird that wasn't a superconductor, or if they just hopefully measured over and over until they started getting some sort of measurement error, is there any known, any obvious way they could have gotten the same graph? One person alleges an online rumor that poorly connected electrical leads can produce the same graph. Is that a conventional view? Alternatively: If this material is a superconductor, have we seen what we expected to see? Is the diminishing current capacity with increased temperature usual? How does this alleged direct measurement of superconductivity square up with the current-story-as-I-understood-it that the material is only being very poorly synthesized, probably only in granules or gaps, and hence only detectable by looking for magnetic resistance / pinning? This is my number-one question. Call it question 1-NO, because it's the question of "How does the NO story explain this graph, and how prior-improbable or prior-likely was that story?", with respect to my number one question. Though I'd also like to know the 1-YES details: whether this looks like a high-prior-probability superconductivity graph; or a graph that requires a new kind of superconductivity, but one that's theoretically straightforward given a central story; or if it looks like unspecified weird superconductivity, with there being no known theory that predicts a graph looking roughly like this. What's up with all the partial levitation videos? Possibilities I'm currently tracking: 2-NO-A: There's something called "diamagnetism" which exists in other materials. The videos by LK and attempted replicators show the putative superconductor being repelled from the magnet, but not being locked in space relative to the magnet. Superconductors are supposed to exhibit Meissner pinning, and the failure of the material to be pinned to the magnet indicates that this isn't a superconductor. (Sabine Hossenfelder seems to talk this way here. "I lost hope when I saw this video; this doesn't look like the Meissner ...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My current LK99 questions, published by Eliezer Yudkowsky on August 1, 2023 on LessWrong. So this morning I thought to myself, "Okay, now I will actually try to study the LK99 question, instead of betting based on nontechnical priors and market sentiment reckoning." (My initial entry into the affray, having been driven by people online presenting as confidently YES when the prediction markets were not confidently YES.) And then I thought to myself, "This LK99 issue seems complicated enough that it'd be worth doing an actual Bayesian calculation on it"--a rare thought; I don't think I've done an actual explicit numerical Bayesian update in at least a year. In the process of trying to set up an explicit calculation, I realized I felt very unsure about some critically important quantities, to the point where it no longer seemed worth trying to do the calculation with numbers. This is the System Working As Intended. On July 30th, Danielle Fong said of this temperature-current-voltage graph, 'Normally as current increases, voltage drop across a material increases. in a superconductor, voltage stays nearly constant, 0. that appears to be what's happening here -- up to a critical current. with higher currents available at lower temperatures deeply in the "fraud or superconduct" territory, imo. like you don't get this by accident -- you either faked it, or really found something.' The graph Fong is talking about only appears in the initial paper put forth by Young-Wan Kwon, allegedly without authorization. A different graph, though similar, appears in Fig. 6 on p. 12 of the 6-author LK-endorsed paper rushed out in response. Is it currently widely held by expert opinion, that this diagram has no obvious or likely explanation except "superconductivity" or "fraud"? If the authors discovered something weird that wasn't a superconductor, or if they just hopefully measured over and over until they started getting some sort of measurement error, is there any known, any obvious way they could have gotten the same graph? One person alleges an online rumor that poorly connected electrical leads can produce the same graph. Is that a conventional view? Alternatively: If this material is a superconductor, have we seen what we expected to see? Is the diminishing current capacity with increased temperature usual? How does this alleged direct measurement of superconductivity square up with the current-story-as-I-understood-it that the material is only being very poorly synthesized, probably only in granules or gaps, and hence only detectable by looking for magnetic resistance / pinning? This is my number-one question. Call it question 1-NO, because it's the question of "How does the NO story explain this graph, and how prior-improbable or prior-likely was that story?", with respect to my number one question. Though I'd also like to know the 1-YES details: whether this looks like a high-prior-probability superconductivity graph; or a graph that requires a new kind of superconductivity, but one that's theoretically straightforward given a central story; or if it looks like unspecified weird superconductivity, with there being no known theory that predicts a graph looking roughly like this. What's up with all the partial levitation videos? Possibilities I'm currently tracking: 2-NO-A: There's something called "diamagnetism" which exists in other materials. The videos by LK and attempted replicators show the putative superconductor being repelled from the magnet, but not being locked in space relative to the magnet. Superconductors are supposed to exhibit Meissner pinning, and the failure of the material to be pinned to the magnet indicates that this isn't a superconductor. (Sabine Hossenfelder seems to talk this way here. "I lost hope when I saw this video; this doesn't look like the Meissner ...]]>
Eliezer Yudkowsky https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 08:40 None full 6765
qsRvpEwmgDBNwPHyP_NL_LW_LW-week LW - Yes, It's Subjective, But Why All The Crabs? by johnswentworth Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Yes, It's Subjective, But Why All The Crabs?, published by johnswentworth on July 28, 2023 on LessWrong. Crabs Nature really loves to evolve crabs. Some early biologist, equipped with knowledge of evolution but not much else, might see all these crabs and expect a common ancestral lineage. That's the obvious explanation of the similarity, after all: if the crabs descended from a common ancestor, then of course we'd expect them to be pretty similar. . but then our hypothetical biologist might start to notice surprisingly deep differences between all these crabs. The smoking gun, of course, would come with genetic sequencing: if the crabs' physiological similarity is achieved by totally different genetic means, or if functionally-irrelevant mutations differ across crab-species by more than mutational noise would induce over the hypothesized evolutionary timescale, then we'd have to conclude that the crabs had different lineages. (In fact, historically, people apparently figured out that crabs have different lineages long before sequencing came along.) Now, having accepted that the crabs have very different lineages, the differences are basically explained. If the crabs all descended from very different lineages, then of course we'd expect them to be very different. . but then our hypothetical biologist returns to the original empirical fact: all these crabs sure are very similar in form. If the crabs all descended from totally different lineages, then the convergent form is a huge empirical surprise! The differences between the crab have ceased to be an interesting puzzle - they're explained - but now the similarities are the interesting puzzle. What caused the convergence? To summarize: if we imagine that the crabs are all closely related, then any deep differences are a surprising empirical fact, and are the main remaining thing our model needs to explain. But once we accept that the crabs are not closely related, then any convergence/similarity is a surprising empirical fact, and is the main remaining thing our model needs to explain. Agents A common starting point for thinking about "What are agents?" is Dennett's intentional stance: Here is how it works: first you decide to treat the object whose behavior is to be predicted as a rational agent; then you figure out what beliefs that agent ought to have, given its place in the world and its purpose. Then you figure out what desires it ought to have, on the same considerations, and finally you predict that this rational agent will act to further its goals in the light of its beliefs. A little practical reasoning from the chosen set of beliefs and desires will in most instances yield a decision about what the agent ought to do; that is what you predict the agent will do. Daniel Dennett, The Intentional Stance, p. 17 One of the main interesting features of the intentional stance is that it hypothesizes subjective agency: I model a system as agentic, and you and I might model different systems as agentic. Compared to a starting point which treats agency as objective, the intentional stance neatly explains many empirical facts - e.g. different people model different things as agents at different times. Sometimes I model other people as planning to achieve goals in the world, sometimes I model them as following set scripts, and you and I might differ in which way we're modeling any given person at any given time. If agency is subjective, then the differences are basically explained. . but then we're faced with a surprising empirical fact: there's a remarkable degree of convergence among which things people do-or-don't model as agentic at which times. Humans yes, rocks no. Even among cases where people disagree, there are certain kinds of arguments/evidence which people generally agree update in a certain direction - e.g. ...]]>
johnswentworth https://www.lesswrong.com/posts/qsRvpEwmgDBNwPHyP/yes-it-s-subjective-but-why-all-the-crabs Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Yes, It's Subjective, But Why All The Crabs?, published by johnswentworth on July 28, 2023 on LessWrong. Crabs Nature really loves to evolve crabs. Some early biologist, equipped with knowledge of evolution but not much else, might see all these crabs and expect a common ancestral lineage. That's the obvious explanation of the similarity, after all: if the crabs descended from a common ancestor, then of course we'd expect them to be pretty similar. . but then our hypothetical biologist might start to notice surprisingly deep differences between all these crabs. The smoking gun, of course, would come with genetic sequencing: if the crabs' physiological similarity is achieved by totally different genetic means, or if functionally-irrelevant mutations differ across crab-species by more than mutational noise would induce over the hypothesized evolutionary timescale, then we'd have to conclude that the crabs had different lineages. (In fact, historically, people apparently figured out that crabs have different lineages long before sequencing came along.) Now, having accepted that the crabs have very different lineages, the differences are basically explained. If the crabs all descended from very different lineages, then of course we'd expect them to be very different. . but then our hypothetical biologist returns to the original empirical fact: all these crabs sure are very similar in form. If the crabs all descended from totally different lineages, then the convergent form is a huge empirical surprise! The differences between the crab have ceased to be an interesting puzzle - they're explained - but now the similarities are the interesting puzzle. What caused the convergence? To summarize: if we imagine that the crabs are all closely related, then any deep differences are a surprising empirical fact, and are the main remaining thing our model needs to explain. But once we accept that the crabs are not closely related, then any convergence/similarity is a surprising empirical fact, and is the main remaining thing our model needs to explain. Agents A common starting point for thinking about "What are agents?" is Dennett's intentional stance: Here is how it works: first you decide to treat the object whose behavior is to be predicted as a rational agent; then you figure out what beliefs that agent ought to have, given its place in the world and its purpose. Then you figure out what desires it ought to have, on the same considerations, and finally you predict that this rational agent will act to further its goals in the light of its beliefs. A little practical reasoning from the chosen set of beliefs and desires will in most instances yield a decision about what the agent ought to do; that is what you predict the agent will do. Daniel Dennett, The Intentional Stance, p. 17 One of the main interesting features of the intentional stance is that it hypothesizes subjective agency: I model a system as agentic, and you and I might model different systems as agentic. Compared to a starting point which treats agency as objective, the intentional stance neatly explains many empirical facts - e.g. different people model different things as agents at different times. Sometimes I model other people as planning to achieve goals in the world, sometimes I model them as following set scripts, and you and I might differ in which way we're modeling any given person at any given time. If agency is subjective, then the differences are basically explained. . but then we're faced with a surprising empirical fact: there's a remarkable degree of convergence among which things people do-or-don't model as agentic at which times. Humans yes, rocks no. Even among cases where people disagree, there are certain kinds of arguments/evidence which people generally agree update in a certain direction - e.g. ...]]>
Fri, 28 Jul 2023 20:05:31 +0000 LW - Yes, It's Subjective, But Why All The Crabs? by johnswentworth Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Yes, It's Subjective, But Why All The Crabs?, published by johnswentworth on July 28, 2023 on LessWrong. Crabs Nature really loves to evolve crabs. Some early biologist, equipped with knowledge of evolution but not much else, might see all these crabs and expect a common ancestral lineage. That's the obvious explanation of the similarity, after all: if the crabs descended from a common ancestor, then of course we'd expect them to be pretty similar. . but then our hypothetical biologist might start to notice surprisingly deep differences between all these crabs. The smoking gun, of course, would come with genetic sequencing: if the crabs' physiological similarity is achieved by totally different genetic means, or if functionally-irrelevant mutations differ across crab-species by more than mutational noise would induce over the hypothesized evolutionary timescale, then we'd have to conclude that the crabs had different lineages. (In fact, historically, people apparently figured out that crabs have different lineages long before sequencing came along.) Now, having accepted that the crabs have very different lineages, the differences are basically explained. If the crabs all descended from very different lineages, then of course we'd expect them to be very different. . but then our hypothetical biologist returns to the original empirical fact: all these crabs sure are very similar in form. If the crabs all descended from totally different lineages, then the convergent form is a huge empirical surprise! The differences between the crab have ceased to be an interesting puzzle - they're explained - but now the similarities are the interesting puzzle. What caused the convergence? To summarize: if we imagine that the crabs are all closely related, then any deep differences are a surprising empirical fact, and are the main remaining thing our model needs to explain. But once we accept that the crabs are not closely related, then any convergence/similarity is a surprising empirical fact, and is the main remaining thing our model needs to explain. Agents A common starting point for thinking about "What are agents?" is Dennett's intentional stance: Here is how it works: first you decide to treat the object whose behavior is to be predicted as a rational agent; then you figure out what beliefs that agent ought to have, given its place in the world and its purpose. Then you figure out what desires it ought to have, on the same considerations, and finally you predict that this rational agent will act to further its goals in the light of its beliefs. A little practical reasoning from the chosen set of beliefs and desires will in most instances yield a decision about what the agent ought to do; that is what you predict the agent will do. Daniel Dennett, The Intentional Stance, p. 17 One of the main interesting features of the intentional stance is that it hypothesizes subjective agency: I model a system as agentic, and you and I might model different systems as agentic. Compared to a starting point which treats agency as objective, the intentional stance neatly explains many empirical facts - e.g. different people model different things as agents at different times. Sometimes I model other people as planning to achieve goals in the world, sometimes I model them as following set scripts, and you and I might differ in which way we're modeling any given person at any given time. If agency is subjective, then the differences are basically explained. . but then we're faced with a surprising empirical fact: there's a remarkable degree of convergence among which things people do-or-don't model as agentic at which times. Humans yes, rocks no. Even among cases where people disagree, there are certain kinds of arguments/evidence which people generally agree update in a certain direction - e.g. ...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Yes, It's Subjective, But Why All The Crabs?, published by johnswentworth on July 28, 2023 on LessWrong. Crabs Nature really loves to evolve crabs. Some early biologist, equipped with knowledge of evolution but not much else, might see all these crabs and expect a common ancestral lineage. That's the obvious explanation of the similarity, after all: if the crabs descended from a common ancestor, then of course we'd expect them to be pretty similar. . but then our hypothetical biologist might start to notice surprisingly deep differences between all these crabs. The smoking gun, of course, would come with genetic sequencing: if the crabs' physiological similarity is achieved by totally different genetic means, or if functionally-irrelevant mutations differ across crab-species by more than mutational noise would induce over the hypothesized evolutionary timescale, then we'd have to conclude that the crabs had different lineages. (In fact, historically, people apparently figured out that crabs have different lineages long before sequencing came along.) Now, having accepted that the crabs have very different lineages, the differences are basically explained. If the crabs all descended from very different lineages, then of course we'd expect them to be very different. . but then our hypothetical biologist returns to the original empirical fact: all these crabs sure are very similar in form. If the crabs all descended from totally different lineages, then the convergent form is a huge empirical surprise! The differences between the crab have ceased to be an interesting puzzle - they're explained - but now the similarities are the interesting puzzle. What caused the convergence? To summarize: if we imagine that the crabs are all closely related, then any deep differences are a surprising empirical fact, and are the main remaining thing our model needs to explain. But once we accept that the crabs are not closely related, then any convergence/similarity is a surprising empirical fact, and is the main remaining thing our model needs to explain. Agents A common starting point for thinking about "What are agents?" is Dennett's intentional stance: Here is how it works: first you decide to treat the object whose behavior is to be predicted as a rational agent; then you figure out what beliefs that agent ought to have, given its place in the world and its purpose. Then you figure out what desires it ought to have, on the same considerations, and finally you predict that this rational agent will act to further its goals in the light of its beliefs. A little practical reasoning from the chosen set of beliefs and desires will in most instances yield a decision about what the agent ought to do; that is what you predict the agent will do. Daniel Dennett, The Intentional Stance, p. 17 One of the main interesting features of the intentional stance is that it hypothesizes subjective agency: I model a system as agentic, and you and I might model different systems as agentic. Compared to a starting point which treats agency as objective, the intentional stance neatly explains many empirical facts - e.g. different people model different things as agents at different times. Sometimes I model other people as planning to achieve goals in the world, sometimes I model them as following set scripts, and you and I might differ in which way we're modeling any given person at any given time. If agency is subjective, then the differences are basically explained. . but then we're faced with a surprising empirical fact: there's a remarkable degree of convergence among which things people do-or-don't model as agentic at which times. Humans yes, rocks no. Even among cases where people disagree, there are certain kinds of arguments/evidence which people generally agree update in a certain direction - e.g. ...]]>
johnswentworth https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 10:26 None full 6729
SbC7duHNDHkd3PkgG_NL_LW_LW-week LW - Alignment Grantmaking is Funding-Limited Right Now by johnswentworth Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Grantmaking is Funding-Limited Right Now, published by johnswentworth on July 19, 2023 on LessWrong. For the past few years, I've generally mostly heard from alignment grantmakers that they're bottlenecked by projects/people they want to fund, not by amount of money. Grantmakers generally had no trouble funding the projects/people they found object-level promising, with money left over. In that environment, figuring out how to turn marginal dollars into new promising researchers/projects - e.g. by finding useful recruitment channels or designing useful training programs - was a major problem. Within the past month or two, that situation has reversed. My understanding is that alignment grantmaking is now mostly funding-bottlenecked. This is mostly based on word-of-mouth, but for instance, I heard that the recent lightspeed grants round received far more applications than they could fund which passed the bar for basic promising-ness. I've also heard that the Long-Term Future Fund (which funded my current grant) now has insufficient money for all the grants they'd like to fund. I don't know whether this is a temporary phenomenon, or longer-term. Alignment research has gone mainstream, so we should expect both more researchers interested and more funders interested. It may be that the researchers pivot a bit faster, but funders will catch up later. Or, it may be that the funding bottleneck becomes the new normal. Regardless, it seems like grantmaking is at least funding-bottlenecked right now. Some takeaways: If you have a big pile of money and would like to help, but haven't been donating much to alignment because the field wasn't money constrained, now is your time! If this situation is the new normal, then earning-to-give for alignment may look like a more useful option again. That said, at this point committing to an earning-to-give path would be a bet on this situation being the new normal. Grants for upskilling, training junior people, and recruitment make a lot less sense right now from grantmakers' perspective. For those applying for grants, asking for less money might make you more likely to be funded. (Historically, grantmakers consistently tell me that most people ask for less money than they should; I don't know whether that will change going forward, but now is an unusually probable time for it to change.) Note that I am not a grantmaker, I'm just passing on what I hear from grantmakers in casual conversation. If anyone with more knowledge wants to chime in, I'd appreciate it. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
johnswentworth https://www.lesswrong.com/posts/SbC7duHNDHkd3PkgG/alignment-grantmaking-is-funding-limited-right-now Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Grantmaking is Funding-Limited Right Now, published by johnswentworth on July 19, 2023 on LessWrong. For the past few years, I've generally mostly heard from alignment grantmakers that they're bottlenecked by projects/people they want to fund, not by amount of money. Grantmakers generally had no trouble funding the projects/people they found object-level promising, with money left over. In that environment, figuring out how to turn marginal dollars into new promising researchers/projects - e.g. by finding useful recruitment channels or designing useful training programs - was a major problem. Within the past month or two, that situation has reversed. My understanding is that alignment grantmaking is now mostly funding-bottlenecked. This is mostly based on word-of-mouth, but for instance, I heard that the recent lightspeed grants round received far more applications than they could fund which passed the bar for basic promising-ness. I've also heard that the Long-Term Future Fund (which funded my current grant) now has insufficient money for all the grants they'd like to fund. I don't know whether this is a temporary phenomenon, or longer-term. Alignment research has gone mainstream, so we should expect both more researchers interested and more funders interested. It may be that the researchers pivot a bit faster, but funders will catch up later. Or, it may be that the funding bottleneck becomes the new normal. Regardless, it seems like grantmaking is at least funding-bottlenecked right now. Some takeaways: If you have a big pile of money and would like to help, but haven't been donating much to alignment because the field wasn't money constrained, now is your time! If this situation is the new normal, then earning-to-give for alignment may look like a more useful option again. That said, at this point committing to an earning-to-give path would be a bet on this situation being the new normal. Grants for upskilling, training junior people, and recruitment make a lot less sense right now from grantmakers' perspective. For those applying for grants, asking for less money might make you more likely to be funded. (Historically, grantmakers consistently tell me that most people ask for less money than they should; I don't know whether that will change going forward, but now is an unusually probable time for it to change.) Note that I am not a grantmaker, I'm just passing on what I hear from grantmakers in casual conversation. If anyone with more knowledge wants to chime in, I'd appreciate it. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
Wed, 19 Jul 2023 17:40:10 +0000 LW - Alignment Grantmaking is Funding-Limited Right Now by johnswentworth Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Grantmaking is Funding-Limited Right Now, published by johnswentworth on July 19, 2023 on LessWrong. For the past few years, I've generally mostly heard from alignment grantmakers that they're bottlenecked by projects/people they want to fund, not by amount of money. Grantmakers generally had no trouble funding the projects/people they found object-level promising, with money left over. In that environment, figuring out how to turn marginal dollars into new promising researchers/projects - e.g. by finding useful recruitment channels or designing useful training programs - was a major problem. Within the past month or two, that situation has reversed. My understanding is that alignment grantmaking is now mostly funding-bottlenecked. This is mostly based on word-of-mouth, but for instance, I heard that the recent lightspeed grants round received far more applications than they could fund which passed the bar for basic promising-ness. I've also heard that the Long-Term Future Fund (which funded my current grant) now has insufficient money for all the grants they'd like to fund. I don't know whether this is a temporary phenomenon, or longer-term. Alignment research has gone mainstream, so we should expect both more researchers interested and more funders interested. It may be that the researchers pivot a bit faster, but funders will catch up later. Or, it may be that the funding bottleneck becomes the new normal. Regardless, it seems like grantmaking is at least funding-bottlenecked right now. Some takeaways: If you have a big pile of money and would like to help, but haven't been donating much to alignment because the field wasn't money constrained, now is your time! If this situation is the new normal, then earning-to-give for alignment may look like a more useful option again. That said, at this point committing to an earning-to-give path would be a bet on this situation being the new normal. Grants for upskilling, training junior people, and recruitment make a lot less sense right now from grantmakers' perspective. For those applying for grants, asking for less money might make you more likely to be funded. (Historically, grantmakers consistently tell me that most people ask for less money than they should; I don't know whether that will change going forward, but now is an unusually probable time for it to change.) Note that I am not a grantmaker, I'm just passing on what I hear from grantmakers in casual conversation. If anyone with more knowledge wants to chime in, I'd appreciate it. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Grantmaking is Funding-Limited Right Now, published by johnswentworth on July 19, 2023 on LessWrong. For the past few years, I've generally mostly heard from alignment grantmakers that they're bottlenecked by projects/people they want to fund, not by amount of money. Grantmakers generally had no trouble funding the projects/people they found object-level promising, with money left over. In that environment, figuring out how to turn marginal dollars into new promising researchers/projects - e.g. by finding useful recruitment channels or designing useful training programs - was a major problem. Within the past month or two, that situation has reversed. My understanding is that alignment grantmaking is now mostly funding-bottlenecked. This is mostly based on word-of-mouth, but for instance, I heard that the recent lightspeed grants round received far more applications than they could fund which passed the bar for basic promising-ness. I've also heard that the Long-Term Future Fund (which funded my current grant) now has insufficient money for all the grants they'd like to fund. I don't know whether this is a temporary phenomenon, or longer-term. Alignment research has gone mainstream, so we should expect both more researchers interested and more funders interested. It may be that the researchers pivot a bit faster, but funders will catch up later. Or, it may be that the funding bottleneck becomes the new normal. Regardless, it seems like grantmaking is at least funding-bottlenecked right now. Some takeaways: If you have a big pile of money and would like to help, but haven't been donating much to alignment because the field wasn't money constrained, now is your time! If this situation is the new normal, then earning-to-give for alignment may look like a more useful option again. That said, at this point committing to an earning-to-give path would be a bet on this situation being the new normal. Grants for upskilling, training junior people, and recruitment make a lot less sense right now from grantmakers' perspective. For those applying for grants, asking for less money might make you more likely to be funded. (Historically, grantmakers consistently tell me that most people ask for less money than they should; I don't know whether that will change going forward, but now is an unusually probable time for it to change.) Note that I am not a grantmaker, I'm just passing on what I hear from grantmakers in casual conversation. If anyone with more knowledge wants to chime in, I'd appreciate it. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
johnswentworth https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 02:26 None full 6629
QBeN49SoKpDMX3kKk_NL_LW_LW-week LW - Accidentally Load Bearing by jefftk Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Accidentally Load Bearing, published by jefftk on July 13, 2023 on LessWrong. Sometimes people will talk about Chesterton's Fence, the idea that if you want to change something - removing an apparently useless fence - you should first determine why it was set up that way: The gate or fence did not grow there. It was not set up by somnambulists who built it in their sleep. It is highly improbable that it was put there by escaped lunatics who were for some reason loose in the street. Some person had some reason for thinking it would be a good thing for somebody. And until we know what the reason was, we really cannot judge whether the reason was reasonable. It is extremely probable that we have overlooked some whole aspect of the question, if something set up by human beings like ourselves seems to be entirely meaningless and mysterious. - G. K. Chesterton, The Drift From Domesticity Figuring out something's designed purpose can be helpful in evaluating changes, but a risk is that it puts you in a frame of mind where what matters is the role the original builders intended. A few years ago I was rebuilding a bathroom in our house, and there was a vertical stud that was in the way. I could easily tell why it was there: it was part of a partition for a closet. And since I knew its designed purpose and no longer needed it for that anymore, the Chesterton's Fence framing would suggest that it was fine to remove it. Except that over time it had become accidentally load bearing: through other (ill conceived) changes to the structure this stud was now helping hold up the second floor of the house. In addition to considering why something was created, you also need to consider what additional purposes it may have since come to serve. This is a concept I've run into a lot when making changes to complex computer systems. It's useful to look back through the change history, read original design documents, and understand why a component was built the way it was. But you also need to look closely at how the component integrates into the system today, where it can easily have taken on additional roles. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
jefftk https://www.lesswrong.com/posts/QBeN49SoKpDMX3kKk/accidentally-load-bearing Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Accidentally Load Bearing, published by jefftk on July 13, 2023 on LessWrong. Sometimes people will talk about Chesterton's Fence, the idea that if you want to change something - removing an apparently useless fence - you should first determine why it was set up that way: The gate or fence did not grow there. It was not set up by somnambulists who built it in their sleep. It is highly improbable that it was put there by escaped lunatics who were for some reason loose in the street. Some person had some reason for thinking it would be a good thing for somebody. And until we know what the reason was, we really cannot judge whether the reason was reasonable. It is extremely probable that we have overlooked some whole aspect of the question, if something set up by human beings like ourselves seems to be entirely meaningless and mysterious. - G. K. Chesterton, The Drift From Domesticity Figuring out something's designed purpose can be helpful in evaluating changes, but a risk is that it puts you in a frame of mind where what matters is the role the original builders intended. A few years ago I was rebuilding a bathroom in our house, and there was a vertical stud that was in the way. I could easily tell why it was there: it was part of a partition for a closet. And since I knew its designed purpose and no longer needed it for that anymore, the Chesterton's Fence framing would suggest that it was fine to remove it. Except that over time it had become accidentally load bearing: through other (ill conceived) changes to the structure this stud was now helping hold up the second floor of the house. In addition to considering why something was created, you also need to consider what additional purposes it may have since come to serve. This is a concept I've run into a lot when making changes to complex computer systems. It's useful to look back through the change history, read original design documents, and understand why a component was built the way it was. But you also need to look closely at how the component integrates into the system today, where it can easily have taken on additional roles. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
Thu, 13 Jul 2023 17:21:16 +0000 LW - Accidentally Load Bearing by jefftk Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Accidentally Load Bearing, published by jefftk on July 13, 2023 on LessWrong. Sometimes people will talk about Chesterton's Fence, the idea that if you want to change something - removing an apparently useless fence - you should first determine why it was set up that way: The gate or fence did not grow there. It was not set up by somnambulists who built it in their sleep. It is highly improbable that it was put there by escaped lunatics who were for some reason loose in the street. Some person had some reason for thinking it would be a good thing for somebody. And until we know what the reason was, we really cannot judge whether the reason was reasonable. It is extremely probable that we have overlooked some whole aspect of the question, if something set up by human beings like ourselves seems to be entirely meaningless and mysterious. - G. K. Chesterton, The Drift From Domesticity Figuring out something's designed purpose can be helpful in evaluating changes, but a risk is that it puts you in a frame of mind where what matters is the role the original builders intended. A few years ago I was rebuilding a bathroom in our house, and there was a vertical stud that was in the way. I could easily tell why it was there: it was part of a partition for a closet. And since I knew its designed purpose and no longer needed it for that anymore, the Chesterton's Fence framing would suggest that it was fine to remove it. Except that over time it had become accidentally load bearing: through other (ill conceived) changes to the structure this stud was now helping hold up the second floor of the house. In addition to considering why something was created, you also need to consider what additional purposes it may have since come to serve. This is a concept I've run into a lot when making changes to complex computer systems. It's useful to look back through the change history, read original design documents, and understand why a component was built the way it was. But you also need to look closely at how the component integrates into the system today, where it can easily have taken on additional roles. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Accidentally Load Bearing, published by jefftk on July 13, 2023 on LessWrong. Sometimes people will talk about Chesterton's Fence, the idea that if you want to change something - removing an apparently useless fence - you should first determine why it was set up that way: The gate or fence did not grow there. It was not set up by somnambulists who built it in their sleep. It is highly improbable that it was put there by escaped lunatics who were for some reason loose in the street. Some person had some reason for thinking it would be a good thing for somebody. And until we know what the reason was, we really cannot judge whether the reason was reasonable. It is extremely probable that we have overlooked some whole aspect of the question, if something set up by human beings like ourselves seems to be entirely meaningless and mysterious. - G. K. Chesterton, The Drift From Domesticity Figuring out something's designed purpose can be helpful in evaluating changes, but a risk is that it puts you in a frame of mind where what matters is the role the original builders intended. A few years ago I was rebuilding a bathroom in our house, and there was a vertical stud that was in the way. I could easily tell why it was there: it was part of a partition for a closet. And since I knew its designed purpose and no longer needed it for that anymore, the Chesterton's Fence framing would suggest that it was fine to remove it. Except that over time it had become accidentally load bearing: through other (ill conceived) changes to the structure this stud was now helping hold up the second floor of the house. In addition to considering why something was created, you also need to consider what additional purposes it may have since come to serve. This is a concept I've run into a lot when making changes to complex computer systems. It's useful to look back through the change history, read original design documents, and understand why a component was built the way it was. But you also need to look closely at how the component integrates into the system today, where it can easily have taken on additional roles. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
jefftk https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 02:01 None full 6577
LNwtnZ7MGTmeifkz3_NL_LW_LW-week LW - Munk AI debate: confusions and possible cruxes by Steven Byrnes Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Munk AI debate: confusions and possible cruxes, published by Steven Byrnes on June 27, 2023 on LessWrong. There was a debate on the statement “AI research and development poses an existential threat” (“x-risk” for short), with Max Tegmark and Yoshua Bengio arguing in favor, and Yann LeCun and Melanie Mitchell arguing against. The YouTube link is here, and a previous discussion on this forum is here. The first part of this blog post is a list of five ways that I think the two sides were talking past each other. The second part is some apparent key underlying beliefs of Yann and Melanie, and how I might try to change their minds. While I am very much on the “in favor” side of this debate, I didn’t want to make this just a “why Yann’s and Melanie’s arguments are all wrong” blog post. OK, granted, it’s a bit of that, especially in the second half. But I hope people on the “anti” side will find this post interesting and not-too-annoying. Five ways people were talking past each other 1. Treating efforts to solve the problem as exogenous or not This subsection doesn’t apply to Melanie, who rejected the idea that there is any existential risk in the foreseeable future. But Yann suggested that there was no existential risk because we will solve it; whereas Max and Yoshua argued that we should acknowledge that there is an existential risk so that we can solve it. By analogy, fires tend not to spread through cities because the fire department and fire codes keep them from spreading. Two perspectives on this are: If you’re an outside observer, you can say that “fires can spread through a city” is evidently not a huge problem in practice. If you’re the chief of the fire department, or if you’re developing and enforcing fire codes, then “fires can spread through a city” is an extremely serious problem that you’re thinking about constantly. I don’t think this was a major source of talking-past-each-other, but added a nonzero amount of confusion. 2. Ambiguously changing the subject to “timelines to x-risk-level AI”, or to “whether large language models (LLMs) will scale to x-risk-level AI” The statement under debate was “AI research and development poses an existential threat”. This statement does not refer to any particular line of AI research, nor any particular time interval. The four participants’ positions in this regard seemed to be: Max and Yoshua: Superhuman AI might happen in 5-20 years, and LLMs have a lot to do with why a reasonable person might believe that. Yann: Human-level AI might happen in 5-20 years, but LLMs have nothing to do with that. LLMs have fundamental limitations. But other types of ML research could get there—e.g. my (Yann’s) own research program. Melanie: LLMs have fundamental limitations, and Yann’s research program is doomed to fail as well. The kind of AI that might pose an x-risk will absolutely not happen in the foreseeable future. (She didn’t quantify how many years is the “foreseeable future”.) It seemed to me that all four participants (and the moderator!) were making timelines and LLM-related arguments, in ways that were both annoyingly vague, and unrelated to the statement under debate. (If astronomers found a giant meteor projected to hit the earth in the year 2123, nobody would question the use of the term “existential threat”, right??) As usual (see my post AI doom from an LLM-plateau-ist perspective), this area was where I had the most complaints about people “on my side”, particularly Yoshua getting awfully close to conceding that under-20-year timelines are a necessary prerequisite to being concerned about AI x-risk. (I don’t know if he literally believes that, but I think he gave that impression. Regardless, I strongly disagree, more on which later.) 3. Vibes-based “meaningless arguments” I recommend in the strongest possible terms that ...]]>
Steven Byrnes https://www.lesswrong.com/posts/LNwtnZ7MGTmeifkz3/munk-ai-debate-confusions-and-possible-cruxes Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Munk AI debate: confusions and possible cruxes, published by Steven Byrnes on June 27, 2023 on LessWrong. There was a debate on the statement “AI research and development poses an existential threat” (“x-risk” for short), with Max Tegmark and Yoshua Bengio arguing in favor, and Yann LeCun and Melanie Mitchell arguing against. The YouTube link is here, and a previous discussion on this forum is here. The first part of this blog post is a list of five ways that I think the two sides were talking past each other. The second part is some apparent key underlying beliefs of Yann and Melanie, and how I might try to change their minds. While I am very much on the “in favor” side of this debate, I didn’t want to make this just a “why Yann’s and Melanie’s arguments are all wrong” blog post. OK, granted, it’s a bit of that, especially in the second half. But I hope people on the “anti” side will find this post interesting and not-too-annoying. Five ways people were talking past each other 1. Treating efforts to solve the problem as exogenous or not This subsection doesn’t apply to Melanie, who rejected the idea that there is any existential risk in the foreseeable future. But Yann suggested that there was no existential risk because we will solve it; whereas Max and Yoshua argued that we should acknowledge that there is an existential risk so that we can solve it. By analogy, fires tend not to spread through cities because the fire department and fire codes keep them from spreading. Two perspectives on this are: If you’re an outside observer, you can say that “fires can spread through a city” is evidently not a huge problem in practice. If you’re the chief of the fire department, or if you’re developing and enforcing fire codes, then “fires can spread through a city” is an extremely serious problem that you’re thinking about constantly. I don’t think this was a major source of talking-past-each-other, but added a nonzero amount of confusion. 2. Ambiguously changing the subject to “timelines to x-risk-level AI”, or to “whether large language models (LLMs) will scale to x-risk-level AI” The statement under debate was “AI research and development poses an existential threat”. This statement does not refer to any particular line of AI research, nor any particular time interval. The four participants’ positions in this regard seemed to be: Max and Yoshua: Superhuman AI might happen in 5-20 years, and LLMs have a lot to do with why a reasonable person might believe that. Yann: Human-level AI might happen in 5-20 years, but LLMs have nothing to do with that. LLMs have fundamental limitations. But other types of ML research could get there—e.g. my (Yann’s) own research program. Melanie: LLMs have fundamental limitations, and Yann’s research program is doomed to fail as well. The kind of AI that might pose an x-risk will absolutely not happen in the foreseeable future. (She didn’t quantify how many years is the “foreseeable future”.) It seemed to me that all four participants (and the moderator!) were making timelines and LLM-related arguments, in ways that were both annoyingly vague, and unrelated to the statement under debate. (If astronomers found a giant meteor projected to hit the earth in the year 2123, nobody would question the use of the term “existential threat”, right??) As usual (see my post AI doom from an LLM-plateau-ist perspective), this area was where I had the most complaints about people “on my side”, particularly Yoshua getting awfully close to conceding that under-20-year timelines are a necessary prerequisite to being concerned about AI x-risk. (I don’t know if he literally believes that, but I think he gave that impression. Regardless, I strongly disagree, more on which later.) 3. Vibes-based “meaningless arguments” I recommend in the strongest possible terms that ...]]>
Tue, 27 Jun 2023 15:29:30 +0000 LW - Munk AI debate: confusions and possible cruxes by Steven Byrnes Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Munk AI debate: confusions and possible cruxes, published by Steven Byrnes on June 27, 2023 on LessWrong. There was a debate on the statement “AI research and development poses an existential threat” (“x-risk” for short), with Max Tegmark and Yoshua Bengio arguing in favor, and Yann LeCun and Melanie Mitchell arguing against. The YouTube link is here, and a previous discussion on this forum is here. The first part of this blog post is a list of five ways that I think the two sides were talking past each other. The second part is some apparent key underlying beliefs of Yann and Melanie, and how I might try to change their minds. While I am very much on the “in favor” side of this debate, I didn’t want to make this just a “why Yann’s and Melanie’s arguments are all wrong” blog post. OK, granted, it’s a bit of that, especially in the second half. But I hope people on the “anti” side will find this post interesting and not-too-annoying. Five ways people were talking past each other 1. Treating efforts to solve the problem as exogenous or not This subsection doesn’t apply to Melanie, who rejected the idea that there is any existential risk in the foreseeable future. But Yann suggested that there was no existential risk because we will solve it; whereas Max and Yoshua argued that we should acknowledge that there is an existential risk so that we can solve it. By analogy, fires tend not to spread through cities because the fire department and fire codes keep them from spreading. Two perspectives on this are: If you’re an outside observer, you can say that “fires can spread through a city” is evidently not a huge problem in practice. If you’re the chief of the fire department, or if you’re developing and enforcing fire codes, then “fires can spread through a city” is an extremely serious problem that you’re thinking about constantly. I don’t think this was a major source of talking-past-each-other, but added a nonzero amount of confusion. 2. Ambiguously changing the subject to “timelines to x-risk-level AI”, or to “whether large language models (LLMs) will scale to x-risk-level AI” The statement under debate was “AI research and development poses an existential threat”. This statement does not refer to any particular line of AI research, nor any particular time interval. The four participants’ positions in this regard seemed to be: Max and Yoshua: Superhuman AI might happen in 5-20 years, and LLMs have a lot to do with why a reasonable person might believe that. Yann: Human-level AI might happen in 5-20 years, but LLMs have nothing to do with that. LLMs have fundamental limitations. But other types of ML research could get there—e.g. my (Yann’s) own research program. Melanie: LLMs have fundamental limitations, and Yann’s research program is doomed to fail as well. The kind of AI that might pose an x-risk will absolutely not happen in the foreseeable future. (She didn’t quantify how many years is the “foreseeable future”.) It seemed to me that all four participants (and the moderator!) were making timelines and LLM-related arguments, in ways that were both annoyingly vague, and unrelated to the statement under debate. (If astronomers found a giant meteor projected to hit the earth in the year 2123, nobody would question the use of the term “existential threat”, right??) As usual (see my post AI doom from an LLM-plateau-ist perspective), this area was where I had the most complaints about people “on my side”, particularly Yoshua getting awfully close to conceding that under-20-year timelines are a necessary prerequisite to being concerned about AI x-risk. (I don’t know if he literally believes that, but I think he gave that impression. Regardless, I strongly disagree, more on which later.) 3. Vibes-based “meaningless arguments” I recommend in the strongest possible terms that ...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Munk AI debate: confusions and possible cruxes, published by Steven Byrnes on June 27, 2023 on LessWrong. There was a debate on the statement “AI research and development poses an existential threat” (“x-risk” for short), with Max Tegmark and Yoshua Bengio arguing in favor, and Yann LeCun and Melanie Mitchell arguing against. The YouTube link is here, and a previous discussion on this forum is here. The first part of this blog post is a list of five ways that I think the two sides were talking past each other. The second part is some apparent key underlying beliefs of Yann and Melanie, and how I might try to change their minds. While I am very much on the “in favor” side of this debate, I didn’t want to make this just a “why Yann’s and Melanie’s arguments are all wrong” blog post. OK, granted, it’s a bit of that, especially in the second half. But I hope people on the “anti” side will find this post interesting and not-too-annoying. Five ways people were talking past each other 1. Treating efforts to solve the problem as exogenous or not This subsection doesn’t apply to Melanie, who rejected the idea that there is any existential risk in the foreseeable future. But Yann suggested that there was no existential risk because we will solve it; whereas Max and Yoshua argued that we should acknowledge that there is an existential risk so that we can solve it. By analogy, fires tend not to spread through cities because the fire department and fire codes keep them from spreading. Two perspectives on this are: If you’re an outside observer, you can say that “fires can spread through a city” is evidently not a huge problem in practice. If you’re the chief of the fire department, or if you’re developing and enforcing fire codes, then “fires can spread through a city” is an extremely serious problem that you’re thinking about constantly. I don’t think this was a major source of talking-past-each-other, but added a nonzero amount of confusion. 2. Ambiguously changing the subject to “timelines to x-risk-level AI”, or to “whether large language models (LLMs) will scale to x-risk-level AI” The statement under debate was “AI research and development poses an existential threat”. This statement does not refer to any particular line of AI research, nor any particular time interval. The four participants’ positions in this regard seemed to be: Max and Yoshua: Superhuman AI might happen in 5-20 years, and LLMs have a lot to do with why a reasonable person might believe that. Yann: Human-level AI might happen in 5-20 years, but LLMs have nothing to do with that. LLMs have fundamental limitations. But other types of ML research could get there—e.g. my (Yann’s) own research program. Melanie: LLMs have fundamental limitations, and Yann’s research program is doomed to fail as well. The kind of AI that might pose an x-risk will absolutely not happen in the foreseeable future. (She didn’t quantify how many years is the “foreseeable future”.) It seemed to me that all four participants (and the moderator!) were making timelines and LLM-related arguments, in ways that were both annoyingly vague, and unrelated to the statement under debate. (If astronomers found a giant meteor projected to hit the earth in the year 2123, nobody would question the use of the term “existential threat”, right??) As usual (see my post AI doom from an LLM-plateau-ist perspective), this area was where I had the most complaints about people “on my side”, particularly Yoshua getting awfully close to conceding that under-20-year timelines are a necessary prerequisite to being concerned about AI x-risk. (I don’t know if he literally believes that, but I think he gave that impression. Regardless, I strongly disagree, more on which later.) 3. Vibes-based “meaningless arguments” I recommend in the strongest possible terms that ...]]>
Steven Byrnes https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 13:38 None full 6411
f3kM7NM5eGMTp3KtZ_NL_LW_LW-week LW - Lessons On How To Get Things Right On The First Try by johnswentworth Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lessons On How To Get Things Right On The First Try, published by johnswentworth on June 19, 2023 on LessWrong. This post is based on several true stories, from a workshop which John has run a few times over the past year. John: Welcome to the Ball -> Cup workshop! Your task for today is simple: I’m going to roll this metal ball: . down this hotwheels ramp: . and off the edge. Your job is to tell me how far from the bottom of the ramp to place a cup on the floor, such that the ball lands in the cup. Oh, and you only get one try. General notes: I won’t try to be tricky with this exercise. You are welcome to make whatever measurements you want of the ball, ramp, etc. You can even do partial runs, e.g. roll the ball down the ramp and stop it at the bottom, or throw the ball through the air. But you only get one full end-to-end run, and anything too close to an end-to-end run is discouraged. After all, in the AI situation for which the exercise is a metaphor, we don’t know exactly when something might foom; we want elbow room. That’s it! Good luck, and let me know when you’re ready to give it a shot. [At this point readers may wish to stop and consider the problem themselves.] Alison: Let’s get that ball in that cup. It looks like this is probably supposed to be a basic physics kind of problem.but there’s got to be some kind of twist or else why would he be having us do it? Maybe the ball is surprisingly light..or maybe the camera angle is misleading and we are supposed to think of something wacky like that?? The Unnoticed Observer: Muahahaha. Alison: That seems.hard. I’ll just start with the basic physics thing and if I run out of time before I can consider the wacky stuff, so be it. So I should probably split this problem into two parts. The part where the ball arcs through the air once off the table is pretty easy. The Unnoticed: True in this case, but how would you notice if it were false? What evidence have you seen? Alison: .but the trouble is getting the exact velocity. What information do I have? Well, I can ask whatever I want, so I should be able to get all the parameters I need for the standard equations. Let’s make a shopping list: I want the starting height of the ball on the ramp (from the table), the mass of the ball, the height of the ramp off the table from multiple points along it (to estimate the curvature,) uhhh. oh shit maybe the bendiness matters! That seems really tricky. I’ll look at that first. Hey, John, can you poke the ramp a bit to demonstrate how much it flexes? John pokes at the ramp and the ramp bends. Well it did flex, but. it can’t have that much of an effect. The Unnoticed: False in this case. Such is the danger of guessing without checking. Alison: Calculating the effect of the ramp’s bendiness seems unreasonably difficult and this workshop is only meant to take an hour or so, so let’s forget that. The Unnoticed: I am reminded of a parable about a quarter and a streetlight. Alison: On to curve estimation! The Unnoticed: Why on earth is she estimating the ramp’s curve anyway? Alison: .Well I don’t actually know how to do much better than the linear approximation I got from the direct measurements. I guess I can treat part of the ramp as linear and then the end part as part of a circle. That will probably be good enough. Ooh if I take a frame from the video, I can just directly measure what the radius circle with arc of best fit is! Okay now that I’ve got that. Well I guess it’s time to look up how to do these physics problems, guess I’m rustier than I thought. I’ll go do that now. Arrrgh okay I didn’t need to do any of that curve stuff after all, I just needed to do some potential/kinetic energy calculations (ignoring friction and air resistance etc) and that’s it! I should have figured it wouldn’t be that hard, this is just a workshop ...]]>
johnswentworth https://www.lesswrong.com/posts/f3kM7NM5eGMTp3KtZ/lessons-on-how-to-get-things-right-on-the-first-try Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lessons On How To Get Things Right On The First Try, published by johnswentworth on June 19, 2023 on LessWrong. This post is based on several true stories, from a workshop which John has run a few times over the past year. John: Welcome to the Ball -> Cup workshop! Your task for today is simple: I’m going to roll this metal ball: . down this hotwheels ramp: . and off the edge. Your job is to tell me how far from the bottom of the ramp to place a cup on the floor, such that the ball lands in the cup. Oh, and you only get one try. General notes: I won’t try to be tricky with this exercise. You are welcome to make whatever measurements you want of the ball, ramp, etc. You can even do partial runs, e.g. roll the ball down the ramp and stop it at the bottom, or throw the ball through the air. But you only get one full end-to-end run, and anything too close to an end-to-end run is discouraged. After all, in the AI situation for which the exercise is a metaphor, we don’t know exactly when something might foom; we want elbow room. That’s it! Good luck, and let me know when you’re ready to give it a shot. [At this point readers may wish to stop and consider the problem themselves.] Alison: Let’s get that ball in that cup. It looks like this is probably supposed to be a basic physics kind of problem.but there’s got to be some kind of twist or else why would he be having us do it? Maybe the ball is surprisingly light..or maybe the camera angle is misleading and we are supposed to think of something wacky like that?? The Unnoticed Observer: Muahahaha. Alison: That seems.hard. I’ll just start with the basic physics thing and if I run out of time before I can consider the wacky stuff, so be it. So I should probably split this problem into two parts. The part where the ball arcs through the air once off the table is pretty easy. The Unnoticed: True in this case, but how would you notice if it were false? What evidence have you seen? Alison: .but the trouble is getting the exact velocity. What information do I have? Well, I can ask whatever I want, so I should be able to get all the parameters I need for the standard equations. Let’s make a shopping list: I want the starting height of the ball on the ramp (from the table), the mass of the ball, the height of the ramp off the table from multiple points along it (to estimate the curvature,) uhhh. oh shit maybe the bendiness matters! That seems really tricky. I’ll look at that first. Hey, John, can you poke the ramp a bit to demonstrate how much it flexes? John pokes at the ramp and the ramp bends. Well it did flex, but. it can’t have that much of an effect. The Unnoticed: False in this case. Such is the danger of guessing without checking. Alison: Calculating the effect of the ramp’s bendiness seems unreasonably difficult and this workshop is only meant to take an hour or so, so let’s forget that. The Unnoticed: I am reminded of a parable about a quarter and a streetlight. Alison: On to curve estimation! The Unnoticed: Why on earth is she estimating the ramp’s curve anyway? Alison: .Well I don’t actually know how to do much better than the linear approximation I got from the direct measurements. I guess I can treat part of the ramp as linear and then the end part as part of a circle. That will probably be good enough. Ooh if I take a frame from the video, I can just directly measure what the radius circle with arc of best fit is! Okay now that I’ve got that. Well I guess it’s time to look up how to do these physics problems, guess I’m rustier than I thought. I’ll go do that now. Arrrgh okay I didn’t need to do any of that curve stuff after all, I just needed to do some potential/kinetic energy calculations (ignoring friction and air resistance etc) and that’s it! I should have figured it wouldn’t be that hard, this is just a workshop ...]]>
Tue, 20 Jun 2023 00:46:21 +0000 LW - Lessons On How To Get Things Right On The First Try by johnswentworth Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lessons On How To Get Things Right On The First Try, published by johnswentworth on June 19, 2023 on LessWrong. This post is based on several true stories, from a workshop which John has run a few times over the past year. John: Welcome to the Ball -> Cup workshop! Your task for today is simple: I’m going to roll this metal ball: . down this hotwheels ramp: . and off the edge. Your job is to tell me how far from the bottom of the ramp to place a cup on the floor, such that the ball lands in the cup. Oh, and you only get one try. General notes: I won’t try to be tricky with this exercise. You are welcome to make whatever measurements you want of the ball, ramp, etc. You can even do partial runs, e.g. roll the ball down the ramp and stop it at the bottom, or throw the ball through the air. But you only get one full end-to-end run, and anything too close to an end-to-end run is discouraged. After all, in the AI situation for which the exercise is a metaphor, we don’t know exactly when something might foom; we want elbow room. That’s it! Good luck, and let me know when you’re ready to give it a shot. [At this point readers may wish to stop and consider the problem themselves.] Alison: Let’s get that ball in that cup. It looks like this is probably supposed to be a basic physics kind of problem.but there’s got to be some kind of twist or else why would he be having us do it? Maybe the ball is surprisingly light..or maybe the camera angle is misleading and we are supposed to think of something wacky like that?? The Unnoticed Observer: Muahahaha. Alison: That seems.hard. I’ll just start with the basic physics thing and if I run out of time before I can consider the wacky stuff, so be it. So I should probably split this problem into two parts. The part where the ball arcs through the air once off the table is pretty easy. The Unnoticed: True in this case, but how would you notice if it were false? What evidence have you seen? Alison: .but the trouble is getting the exact velocity. What information do I have? Well, I can ask whatever I want, so I should be able to get all the parameters I need for the standard equations. Let’s make a shopping list: I want the starting height of the ball on the ramp (from the table), the mass of the ball, the height of the ramp off the table from multiple points along it (to estimate the curvature,) uhhh. oh shit maybe the bendiness matters! That seems really tricky. I’ll look at that first. Hey, John, can you poke the ramp a bit to demonstrate how much it flexes? John pokes at the ramp and the ramp bends. Well it did flex, but. it can’t have that much of an effect. The Unnoticed: False in this case. Such is the danger of guessing without checking. Alison: Calculating the effect of the ramp’s bendiness seems unreasonably difficult and this workshop is only meant to take an hour or so, so let’s forget that. The Unnoticed: I am reminded of a parable about a quarter and a streetlight. Alison: On to curve estimation! The Unnoticed: Why on earth is she estimating the ramp’s curve anyway? Alison: .Well I don’t actually know how to do much better than the linear approximation I got from the direct measurements. I guess I can treat part of the ramp as linear and then the end part as part of a circle. That will probably be good enough. Ooh if I take a frame from the video, I can just directly measure what the radius circle with arc of best fit is! Okay now that I’ve got that. Well I guess it’s time to look up how to do these physics problems, guess I’m rustier than I thought. I’ll go do that now. Arrrgh okay I didn’t need to do any of that curve stuff after all, I just needed to do some potential/kinetic energy calculations (ignoring friction and air resistance etc) and that’s it! I should have figured it wouldn’t be that hard, this is just a workshop ...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lessons On How To Get Things Right On The First Try, published by johnswentworth on June 19, 2023 on LessWrong. This post is based on several true stories, from a workshop which John has run a few times over the past year. John: Welcome to the Ball -> Cup workshop! Your task for today is simple: I’m going to roll this metal ball: . down this hotwheels ramp: . and off the edge. Your job is to tell me how far from the bottom of the ramp to place a cup on the floor, such that the ball lands in the cup. Oh, and you only get one try. General notes: I won’t try to be tricky with this exercise. You are welcome to make whatever measurements you want of the ball, ramp, etc. You can even do partial runs, e.g. roll the ball down the ramp and stop it at the bottom, or throw the ball through the air. But you only get one full end-to-end run, and anything too close to an end-to-end run is discouraged. After all, in the AI situation for which the exercise is a metaphor, we don’t know exactly when something might foom; we want elbow room. That’s it! Good luck, and let me know when you’re ready to give it a shot. [At this point readers may wish to stop and consider the problem themselves.] Alison: Let’s get that ball in that cup. It looks like this is probably supposed to be a basic physics kind of problem.but there’s got to be some kind of twist or else why would he be having us do it? Maybe the ball is surprisingly light..or maybe the camera angle is misleading and we are supposed to think of something wacky like that?? The Unnoticed Observer: Muahahaha. Alison: That seems.hard. I’ll just start with the basic physics thing and if I run out of time before I can consider the wacky stuff, so be it. So I should probably split this problem into two parts. The part where the ball arcs through the air once off the table is pretty easy. The Unnoticed: True in this case, but how would you notice if it were false? What evidence have you seen? Alison: .but the trouble is getting the exact velocity. What information do I have? Well, I can ask whatever I want, so I should be able to get all the parameters I need for the standard equations. Let’s make a shopping list: I want the starting height of the ball on the ramp (from the table), the mass of the ball, the height of the ramp off the table from multiple points along it (to estimate the curvature,) uhhh. oh shit maybe the bendiness matters! That seems really tricky. I’ll look at that first. Hey, John, can you poke the ramp a bit to demonstrate how much it flexes? John pokes at the ramp and the ramp bends. Well it did flex, but. it can’t have that much of an effect. The Unnoticed: False in this case. Such is the danger of guessing without checking. Alison: Calculating the effect of the ramp’s bendiness seems unreasonably difficult and this workshop is only meant to take an hour or so, so let’s forget that. The Unnoticed: I am reminded of a parable about a quarter and a streetlight. Alison: On to curve estimation! The Unnoticed: Why on earth is she estimating the ramp’s curve anyway? Alison: .Well I don’t actually know how to do much better than the linear approximation I got from the direct measurements. I guess I can treat part of the ramp as linear and then the end part as part of a circle. That will probably be good enough. Ooh if I take a frame from the video, I can just directly measure what the radius circle with arc of best fit is! Okay now that I’ve got that. Well I guess it’s time to look up how to do these physics problems, guess I’m rustier than I thought. I’ll go do that now. Arrrgh okay I didn’t need to do any of that curve stuff after all, I just needed to do some potential/kinetic energy calculations (ignoring friction and air resistance etc) and that’s it! I should have figured it wouldn’t be that hard, this is just a workshop ...]]>
johnswentworth https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 14:55 None full 6333
9iDw6ugMPk7pmXuyW_NL_LW_LW-week LW - Lightcone Infrastructure is looking for funding by habryka Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lightcone Infrastructure is looking for funding, published by habryka on June 14, 2023 on LessWrong. Lightcone Infrastructure is looking for funding and are working on the following projects: We run LessWrong, the AI Alignment Forum, and have written a lot of the code behind the Effective Altruism Forum. During 2022 and early 2023 we ran the Lightcone Offices, and are now building out a campus at the Rose Garden Inn in Berkeley, where we've been doing repairs and renovations for the past few months. We've also been substantially involved in the Survival and Flourishing Fund's S-Process (having written the app that runs the process) and are now running Lightspeed Grants. We also pursue a wide range of other smaller projects in the space of "community infrastructure" and "community crisis management". This includes running events, investigating harm caused by community institutions and actors, supporting programs like SERI MATS, and maintaining various small pieces of software infrastructure. If you are interested in funding us, please shoot me an email at habryka@lesswrong.com (or if you want to give smaller amounts, you can donate directly via PayPal here). Funding is quite tight since the collapse of FTX, and I do think we work on projects that have a decent chance of reducing existential risk and generally making humanity's future go a lot better, though this kind of stuff sure is hard to tell. We are looking to raise around $3M to $6M for our operations in the next 12 months. Also feel free to ask any questions in the comments. Two draft readers of this post expressed confusion that Lightcone needs money, given that we just announced a funding process that is promising to give away $5M in the next two months. The answer to that is that we do not own the money moved via Lightspeed Grants and are only providing grant recommendations to Jaan Tallinn and other funders. We do separately apply for funding from the Survival and Flourishing Fund, through which Jaan has been our second biggest funder. We also continue to actively fundraise from both SFF and Open Philanthropy (our largest funder). Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
habryka https://www.lesswrong.com/posts/9iDw6ugMPk7pmXuyW/lightcone-infrastructure-is-looking-for-funding Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lightcone Infrastructure is looking for funding, published by habryka on June 14, 2023 on LessWrong. Lightcone Infrastructure is looking for funding and are working on the following projects: We run LessWrong, the AI Alignment Forum, and have written a lot of the code behind the Effective Altruism Forum. During 2022 and early 2023 we ran the Lightcone Offices, and are now building out a campus at the Rose Garden Inn in Berkeley, where we've been doing repairs and renovations for the past few months. We've also been substantially involved in the Survival and Flourishing Fund's S-Process (having written the app that runs the process) and are now running Lightspeed Grants. We also pursue a wide range of other smaller projects in the space of "community infrastructure" and "community crisis management". This includes running events, investigating harm caused by community institutions and actors, supporting programs like SERI MATS, and maintaining various small pieces of software infrastructure. If you are interested in funding us, please shoot me an email at habryka@lesswrong.com (or if you want to give smaller amounts, you can donate directly via PayPal here). Funding is quite tight since the collapse of FTX, and I do think we work on projects that have a decent chance of reducing existential risk and generally making humanity's future go a lot better, though this kind of stuff sure is hard to tell. We are looking to raise around $3M to $6M for our operations in the next 12 months. Also feel free to ask any questions in the comments. Two draft readers of this post expressed confusion that Lightcone needs money, given that we just announced a funding process that is promising to give away $5M in the next two months. The answer to that is that we do not own the money moved via Lightspeed Grants and are only providing grant recommendations to Jaan Tallinn and other funders. We do separately apply for funding from the Survival and Flourishing Fund, through which Jaan has been our second biggest funder. We also continue to actively fundraise from both SFF and Open Philanthropy (our largest funder). Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
Wed, 14 Jun 2023 05:17:46 +0000 LW - Lightcone Infrastructure is looking for funding by habryka Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lightcone Infrastructure is looking for funding, published by habryka on June 14, 2023 on LessWrong. Lightcone Infrastructure is looking for funding and are working on the following projects: We run LessWrong, the AI Alignment Forum, and have written a lot of the code behind the Effective Altruism Forum. During 2022 and early 2023 we ran the Lightcone Offices, and are now building out a campus at the Rose Garden Inn in Berkeley, where we've been doing repairs and renovations for the past few months. We've also been substantially involved in the Survival and Flourishing Fund's S-Process (having written the app that runs the process) and are now running Lightspeed Grants. We also pursue a wide range of other smaller projects in the space of "community infrastructure" and "community crisis management". This includes running events, investigating harm caused by community institutions and actors, supporting programs like SERI MATS, and maintaining various small pieces of software infrastructure. If you are interested in funding us, please shoot me an email at habryka@lesswrong.com (or if you want to give smaller amounts, you can donate directly via PayPal here). Funding is quite tight since the collapse of FTX, and I do think we work on projects that have a decent chance of reducing existential risk and generally making humanity's future go a lot better, though this kind of stuff sure is hard to tell. We are looking to raise around $3M to $6M for our operations in the next 12 months. Also feel free to ask any questions in the comments. Two draft readers of this post expressed confusion that Lightcone needs money, given that we just announced a funding process that is promising to give away $5M in the next two months. The answer to that is that we do not own the money moved via Lightspeed Grants and are only providing grant recommendations to Jaan Tallinn and other funders. We do separately apply for funding from the Survival and Flourishing Fund, through which Jaan has been our second biggest funder. We also continue to actively fundraise from both SFF and Open Philanthropy (our largest funder). Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lightcone Infrastructure is looking for funding, published by habryka on June 14, 2023 on LessWrong. Lightcone Infrastructure is looking for funding and are working on the following projects: We run LessWrong, the AI Alignment Forum, and have written a lot of the code behind the Effective Altruism Forum. During 2022 and early 2023 we ran the Lightcone Offices, and are now building out a campus at the Rose Garden Inn in Berkeley, where we've been doing repairs and renovations for the past few months. We've also been substantially involved in the Survival and Flourishing Fund's S-Process (having written the app that runs the process) and are now running Lightspeed Grants. We also pursue a wide range of other smaller projects in the space of "community infrastructure" and "community crisis management". This includes running events, investigating harm caused by community institutions and actors, supporting programs like SERI MATS, and maintaining various small pieces of software infrastructure. If you are interested in funding us, please shoot me an email at habryka@lesswrong.com (or if you want to give smaller amounts, you can donate directly via PayPal here). Funding is quite tight since the collapse of FTX, and I do think we work on projects that have a decent chance of reducing existential risk and generally making humanity's future go a lot better, though this kind of stuff sure is hard to tell. We are looking to raise around $3M to $6M for our operations in the next 12 months. Also feel free to ask any questions in the comments. Two draft readers of this post expressed confusion that Lightcone needs money, given that we just announced a funding process that is promising to give away $5M in the next two months. The answer to that is that we do not own the money moved via Lightspeed Grants and are only providing grant recommendations to Jaan Tallinn and other funders. We do separately apply for funding from the Survival and Flourishing Fund, through which Jaan has been our second biggest funder. We also continue to actively fundraise from both SFF and Open Philanthropy (our largest funder). Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
habryka https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 02:05 None full 6275
ejxwraMP5ye7Bgmpm_NL_LW_LW-week LW - Things I Learned by Spending Five Thousand Hours In Non-EA Charities by jenn Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Things I Learned by Spending Five Thousand Hours In Non-EA Charities, published by jenn on June 1, 2023 on LessWrong. From late 2020 to last month, I worked at grassroots-level non-profits in operational roles. Over that time, I’ve seen surprisingly effective deployments of strategies that were counter-intuitive to my EA and rationalist sensibilities. I spent 6 months being the on-shift operations manager at one of the five largest food banks in Toronto (~50 staff/volunteers), and 2 years doing logistics work at Samaritans (fake name), a long-lived charity that was so multi-armed that it was basically operating as a supplementary social services department for the city it was in(~200 staff and 200 volunteers). Both of these non-profits were well-run, though both dealt with the traditional non-profit double whammy of being underfunded and understaffed. Neither place was super open to many EA concepts (explicit cost-benefit analyses, the ITN framework, geographic impartiality, the general sense that talent was the constraining factor instead of money, etc). Samaritans in particular is a spectacular non-profit, despite(?) having basically anti-EA philosophies, such as: Being very localist; Samaritans was established to help residents of the city it was founded in, and now very specialized in doing that. Adherence to faith; the philosophy of The Catholic Worker Movement continues to inform the operating choices of Samaritans to this day. A big streak of techno-pessimism; technology is first and foremost seen as a source of exploitation and alienation, and adopted only with great reluctance when necessary. Not treating money as fungible. The majority of funding came from grants or donations tied to specific projects or outcomes. (This is a system that the vast majority of nonprofits operate in.) Once early on I gently pushed them towards applying to some EA grants for some of their more EA-aligned work, and they were immediately turned off by the general vibes of EA upon visiting some of its websites. I think the term “borg-like” was used. Over this post, I’ll be largely focusing on Samaritans as I’ve worked there longer and in a more central role, and it’s also a more interesting case study due to its stronger anti-EA sentiment. Things I Learned Long Term Reputation is Priceless Non-Profits Shouldn’t Be Islands Slack is Incredibly Powerful Hospitality is Pretty Important For each learning, I have a section for sketches for EA integration – I hesitate to call them anything as strong as recommendations, because the point is to give more concrete examples of what it could look like integrated in an EA framework, rather than saying that it’s the correct way forward. 1. Long Term Reputation is Priceless Institutional trust unlocks a stupid amount of value, and you can’t buy it with money. Lots of resources (amenity rentals; the mayor’s endorsement; business services; pro-bono and monetary donations) are priced/offered based on tail risk. If you can establish that you’re not a risk by having a longstanding, unblemished reputation, costs go way down for you, and opportunities way up. This is the world that Samaritans now operate in. Samaritans had a much better, easier time at city hall compared to newer organizations, because of a decades-long productive relationship where we were really helpful with issues surrounding unemployment and homelessness. Permits get back to us really fast, applications get waved through with tedious steps bypassed, and fees are frequently waived. And it made sense that this was happening! Cities also deal with budget and staffing issues, why waste more time and effort than necessary on someone who you know knows the proper procedure and will ethically follow it to the letter? It’s not just city hall. A few years ago, a local church offered up their...]]>
jenn https://www.lesswrong.com/posts/ejxwraMP5ye7Bgmpm/things-i-learned-by-spending-five-thousand-hours-in-non-ea Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Things I Learned by Spending Five Thousand Hours In Non-EA Charities, published by jenn on June 1, 2023 on LessWrong. From late 2020 to last month, I worked at grassroots-level non-profits in operational roles. Over that time, I’ve seen surprisingly effective deployments of strategies that were counter-intuitive to my EA and rationalist sensibilities. I spent 6 months being the on-shift operations manager at one of the five largest food banks in Toronto (~50 staff/volunteers), and 2 years doing logistics work at Samaritans (fake name), a long-lived charity that was so multi-armed that it was basically operating as a supplementary social services department for the city it was in(~200 staff and 200 volunteers). Both of these non-profits were well-run, though both dealt with the traditional non-profit double whammy of being underfunded and understaffed. Neither place was super open to many EA concepts (explicit cost-benefit analyses, the ITN framework, geographic impartiality, the general sense that talent was the constraining factor instead of money, etc). Samaritans in particular is a spectacular non-profit, despite(?) having basically anti-EA philosophies, such as: Being very localist; Samaritans was established to help residents of the city it was founded in, and now very specialized in doing that. Adherence to faith; the philosophy of The Catholic Worker Movement continues to inform the operating choices of Samaritans to this day. A big streak of techno-pessimism; technology is first and foremost seen as a source of exploitation and alienation, and adopted only with great reluctance when necessary. Not treating money as fungible. The majority of funding came from grants or donations tied to specific projects or outcomes. (This is a system that the vast majority of nonprofits operate in.) Once early on I gently pushed them towards applying to some EA grants for some of their more EA-aligned work, and they were immediately turned off by the general vibes of EA upon visiting some of its websites. I think the term “borg-like” was used. Over this post, I’ll be largely focusing on Samaritans as I’ve worked there longer and in a more central role, and it’s also a more interesting case study due to its stronger anti-EA sentiment. Things I Learned Long Term Reputation is Priceless Non-Profits Shouldn’t Be Islands Slack is Incredibly Powerful Hospitality is Pretty Important For each learning, I have a section for sketches for EA integration – I hesitate to call them anything as strong as recommendations, because the point is to give more concrete examples of what it could look like integrated in an EA framework, rather than saying that it’s the correct way forward. 1. Long Term Reputation is Priceless Institutional trust unlocks a stupid amount of value, and you can’t buy it with money. Lots of resources (amenity rentals; the mayor’s endorsement; business services; pro-bono and monetary donations) are priced/offered based on tail risk. If you can establish that you’re not a risk by having a longstanding, unblemished reputation, costs go way down for you, and opportunities way up. This is the world that Samaritans now operate in. Samaritans had a much better, easier time at city hall compared to newer organizations, because of a decades-long productive relationship where we were really helpful with issues surrounding unemployment and homelessness. Permits get back to us really fast, applications get waved through with tedious steps bypassed, and fees are frequently waived. And it made sense that this was happening! Cities also deal with budget and staffing issues, why waste more time and effort than necessary on someone who you know knows the proper procedure and will ethically follow it to the letter? It’s not just city hall. A few years ago, a local church offered up their...]]>
Thu, 01 Jun 2023 21:52:40 +0000 LW - Things I Learned by Spending Five Thousand Hours In Non-EA Charities by jenn Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Things I Learned by Spending Five Thousand Hours In Non-EA Charities, published by jenn on June 1, 2023 on LessWrong. From late 2020 to last month, I worked at grassroots-level non-profits in operational roles. Over that time, I’ve seen surprisingly effective deployments of strategies that were counter-intuitive to my EA and rationalist sensibilities. I spent 6 months being the on-shift operations manager at one of the five largest food banks in Toronto (~50 staff/volunteers), and 2 years doing logistics work at Samaritans (fake name), a long-lived charity that was so multi-armed that it was basically operating as a supplementary social services department for the city it was in(~200 staff and 200 volunteers). Both of these non-profits were well-run, though both dealt with the traditional non-profit double whammy of being underfunded and understaffed. Neither place was super open to many EA concepts (explicit cost-benefit analyses, the ITN framework, geographic impartiality, the general sense that talent was the constraining factor instead of money, etc). Samaritans in particular is a spectacular non-profit, despite(?) having basically anti-EA philosophies, such as: Being very localist; Samaritans was established to help residents of the city it was founded in, and now very specialized in doing that. Adherence to faith; the philosophy of The Catholic Worker Movement continues to inform the operating choices of Samaritans to this day. A big streak of techno-pessimism; technology is first and foremost seen as a source of exploitation and alienation, and adopted only with great reluctance when necessary. Not treating money as fungible. The majority of funding came from grants or donations tied to specific projects or outcomes. (This is a system that the vast majority of nonprofits operate in.) Once early on I gently pushed them towards applying to some EA grants for some of their more EA-aligned work, and they were immediately turned off by the general vibes of EA upon visiting some of its websites. I think the term “borg-like” was used. Over this post, I’ll be largely focusing on Samaritans as I’ve worked there longer and in a more central role, and it’s also a more interesting case study due to its stronger anti-EA sentiment. Things I Learned Long Term Reputation is Priceless Non-Profits Shouldn’t Be Islands Slack is Incredibly Powerful Hospitality is Pretty Important For each learning, I have a section for sketches for EA integration – I hesitate to call them anything as strong as recommendations, because the point is to give more concrete examples of what it could look like integrated in an EA framework, rather than saying that it’s the correct way forward. 1. Long Term Reputation is Priceless Institutional trust unlocks a stupid amount of value, and you can’t buy it with money. Lots of resources (amenity rentals; the mayor’s endorsement; business services; pro-bono and monetary donations) are priced/offered based on tail risk. If you can establish that you’re not a risk by having a longstanding, unblemished reputation, costs go way down for you, and opportunities way up. This is the world that Samaritans now operate in. Samaritans had a much better, easier time at city hall compared to newer organizations, because of a decades-long productive relationship where we were really helpful with issues surrounding unemployment and homelessness. Permits get back to us really fast, applications get waved through with tedious steps bypassed, and fees are frequently waived. And it made sense that this was happening! Cities also deal with budget and staffing issues, why waste more time and effort than necessary on someone who you know knows the proper procedure and will ethically follow it to the letter? It’s not just city hall. A few years ago, a local church offered up their...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Things I Learned by Spending Five Thousand Hours In Non-EA Charities, published by jenn on June 1, 2023 on LessWrong. From late 2020 to last month, I worked at grassroots-level non-profits in operational roles. Over that time, I’ve seen surprisingly effective deployments of strategies that were counter-intuitive to my EA and rationalist sensibilities. I spent 6 months being the on-shift operations manager at one of the five largest food banks in Toronto (~50 staff/volunteers), and 2 years doing logistics work at Samaritans (fake name), a long-lived charity that was so multi-armed that it was basically operating as a supplementary social services department for the city it was in(~200 staff and 200 volunteers). Both of these non-profits were well-run, though both dealt with the traditional non-profit double whammy of being underfunded and understaffed. Neither place was super open to many EA concepts (explicit cost-benefit analyses, the ITN framework, geographic impartiality, the general sense that talent was the constraining factor instead of money, etc). Samaritans in particular is a spectacular non-profit, despite(?) having basically anti-EA philosophies, such as: Being very localist; Samaritans was established to help residents of the city it was founded in, and now very specialized in doing that. Adherence to faith; the philosophy of The Catholic Worker Movement continues to inform the operating choices of Samaritans to this day. A big streak of techno-pessimism; technology is first and foremost seen as a source of exploitation and alienation, and adopted only with great reluctance when necessary. Not treating money as fungible. The majority of funding came from grants or donations tied to specific projects or outcomes. (This is a system that the vast majority of nonprofits operate in.) Once early on I gently pushed them towards applying to some EA grants for some of their more EA-aligned work, and they were immediately turned off by the general vibes of EA upon visiting some of its websites. I think the term “borg-like” was used. Over this post, I’ll be largely focusing on Samaritans as I’ve worked there longer and in a more central role, and it’s also a more interesting case study due to its stronger anti-EA sentiment. Things I Learned Long Term Reputation is Priceless Non-Profits Shouldn’t Be Islands Slack is Incredibly Powerful Hospitality is Pretty Important For each learning, I have a section for sketches for EA integration – I hesitate to call them anything as strong as recommendations, because the point is to give more concrete examples of what it could look like integrated in an EA framework, rather than saying that it’s the correct way forward. 1. Long Term Reputation is Priceless Institutional trust unlocks a stupid amount of value, and you can’t buy it with money. Lots of resources (amenity rentals; the mayor’s endorsement; business services; pro-bono and monetary donations) are priced/offered based on tail risk. If you can establish that you’re not a risk by having a longstanding, unblemished reputation, costs go way down for you, and opportunities way up. This is the world that Samaritans now operate in. Samaritans had a much better, easier time at city hall compared to newer organizations, because of a decades-long productive relationship where we were really helpful with issues surrounding unemployment and homelessness. Permits get back to us really fast, applications get waved through with tedious steps bypassed, and fees are frequently waived. And it made sense that this was happening! Cities also deal with budget and staffing issues, why waste more time and effort than necessary on someone who you know knows the proper procedure and will ethically follow it to the letter? It’s not just city hall. A few years ago, a local church offered up their...]]>
jenn https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 12:53 None full 6144
euam65XjigaCJQkcN_NL_LW_LW-week LW - An Analogy for Understanding Transformers by TheMcDouglas Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An Analogy for Understanding Transformers, published by TheMcDouglas on May 13, 2023 on LessWrong. Thanks to the following people for feedback: Tilman Rauker, Curt Tigges, Rudolf Laine, Logan Smith, Arthur Conmy, Joseph Bloom, Rusheb Shah, James Dao. TL;DR I present an analogy for the transformer architecture: each vector in the residual stream is a person standing in a line, who is holding a token, and trying to guess what token the person in front of them is holding. Attention heads represent questions that people in this line can ask to everyone standing behind them (queries are the questions, keys determine who answers the questions, values determine what information gets passed back to the original question-asker), and MLPs represent the internal processing done by each person in the line. I claim this is a useful way to intuitively understand the transformer architecture, and I'll present several reasons for this (as well as ways induction heads and indirect object identification can be understood in these terms). Introduction In this post, I'm going to present an analogy for understanding how transformers work. I expect this to be useful for anyone who understands the basics of transformers, in particular people who have gone through Neel Nanda's tutorial, and/or understand the following points at a minimum: What a transformer's input is, what its outputs represent, and the nature of the predict-next-token task that it's trained on What the shape of the residual stream is, and the idea of components of the transformer reading from / writing to the residual stream throughout the model's layers How a transformer is composed of multiple blocks, each one containing an MLP (which does processing on vectors at individual sequence positions), and an attention layer (which moves information between the residual stream vectors at different sequence positions). I think the analogy still offers value even for people who understand transformers deeply already. The Analogy A line is formed by a group of people, each person holding a word. Everyone knows their own word and position in the line, but they can't see anyone else in the line. The objective for each person is to guess the word held by the person in front of them. People have the ability to shout questions to everyone standing behind them in the line (those in front cannot hear them). Upon hearing a question, each individual can choose whether or not to respond, and what information to relay back to the person who asked. After this, people don't remember the questions they were asked (so no information can move backwards in the line, only forwards). As individuals in the line gather information from these exchanges, they can use this information to formulate subsequent questions and provide answers. How this relates to transformer architecture: Each person in the line is a vector in the residual stream They start with just information about their own word (token embedding) and position in the line (positional embedding) The attention heads correspond to the questions that people in the line ask each other: Queries = question (which gets asked to everyone behind them in the line) Keys = how the people who hear the question decide whether or not to reply Values = the information that the people who reply pass back to the person who originally asked the question People can use information gained from earlier questions when answering / asking later questions - this is composition The MLPs correspond to the information processing / factual recall performed by each person in the sequence independently The unembedding at the end of the model is when we ask each person in the line for a final guess at what the next word is (in the form of a probability distribution over all possible words) Key Concepts for Transformers In...]]>
TheMcDouglas https://www.lesswrong.com/posts/euam65XjigaCJQkcN/an-analogy-for-understanding-transformers Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An Analogy for Understanding Transformers, published by TheMcDouglas on May 13, 2023 on LessWrong. Thanks to the following people for feedback: Tilman Rauker, Curt Tigges, Rudolf Laine, Logan Smith, Arthur Conmy, Joseph Bloom, Rusheb Shah, James Dao. TL;DR I present an analogy for the transformer architecture: each vector in the residual stream is a person standing in a line, who is holding a token, and trying to guess what token the person in front of them is holding. Attention heads represent questions that people in this line can ask to everyone standing behind them (queries are the questions, keys determine who answers the questions, values determine what information gets passed back to the original question-asker), and MLPs represent the internal processing done by each person in the line. I claim this is a useful way to intuitively understand the transformer architecture, and I'll present several reasons for this (as well as ways induction heads and indirect object identification can be understood in these terms). Introduction In this post, I'm going to present an analogy for understanding how transformers work. I expect this to be useful for anyone who understands the basics of transformers, in particular people who have gone through Neel Nanda's tutorial, and/or understand the following points at a minimum: What a transformer's input is, what its outputs represent, and the nature of the predict-next-token task that it's trained on What the shape of the residual stream is, and the idea of components of the transformer reading from / writing to the residual stream throughout the model's layers How a transformer is composed of multiple blocks, each one containing an MLP (which does processing on vectors at individual sequence positions), and an attention layer (which moves information between the residual stream vectors at different sequence positions). I think the analogy still offers value even for people who understand transformers deeply already. The Analogy A line is formed by a group of people, each person holding a word. Everyone knows their own word and position in the line, but they can't see anyone else in the line. The objective for each person is to guess the word held by the person in front of them. People have the ability to shout questions to everyone standing behind them in the line (those in front cannot hear them). Upon hearing a question, each individual can choose whether or not to respond, and what information to relay back to the person who asked. After this, people don't remember the questions they were asked (so no information can move backwards in the line, only forwards). As individuals in the line gather information from these exchanges, they can use this information to formulate subsequent questions and provide answers. How this relates to transformer architecture: Each person in the line is a vector in the residual stream They start with just information about their own word (token embedding) and position in the line (positional embedding) The attention heads correspond to the questions that people in the line ask each other: Queries = question (which gets asked to everyone behind them in the line) Keys = how the people who hear the question decide whether or not to reply Values = the information that the people who reply pass back to the person who originally asked the question People can use information gained from earlier questions when answering / asking later questions - this is composition The MLPs correspond to the information processing / factual recall performed by each person in the sequence independently The unembedding at the end of the model is when we ask each person in the line for a final guess at what the next word is (in the form of a probability distribution over all possible words) Key Concepts for Transformers In...]]>
Sat, 13 May 2023 20:58:49 +0000 LW - An Analogy for Understanding Transformers by TheMcDouglas Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An Analogy for Understanding Transformers, published by TheMcDouglas on May 13, 2023 on LessWrong. Thanks to the following people for feedback: Tilman Rauker, Curt Tigges, Rudolf Laine, Logan Smith, Arthur Conmy, Joseph Bloom, Rusheb Shah, James Dao. TL;DR I present an analogy for the transformer architecture: each vector in the residual stream is a person standing in a line, who is holding a token, and trying to guess what token the person in front of them is holding. Attention heads represent questions that people in this line can ask to everyone standing behind them (queries are the questions, keys determine who answers the questions, values determine what information gets passed back to the original question-asker), and MLPs represent the internal processing done by each person in the line. I claim this is a useful way to intuitively understand the transformer architecture, and I'll present several reasons for this (as well as ways induction heads and indirect object identification can be understood in these terms). Introduction In this post, I'm going to present an analogy for understanding how transformers work. I expect this to be useful for anyone who understands the basics of transformers, in particular people who have gone through Neel Nanda's tutorial, and/or understand the following points at a minimum: What a transformer's input is, what its outputs represent, and the nature of the predict-next-token task that it's trained on What the shape of the residual stream is, and the idea of components of the transformer reading from / writing to the residual stream throughout the model's layers How a transformer is composed of multiple blocks, each one containing an MLP (which does processing on vectors at individual sequence positions), and an attention layer (which moves information between the residual stream vectors at different sequence positions). I think the analogy still offers value even for people who understand transformers deeply already. The Analogy A line is formed by a group of people, each person holding a word. Everyone knows their own word and position in the line, but they can't see anyone else in the line. The objective for each person is to guess the word held by the person in front of them. People have the ability to shout questions to everyone standing behind them in the line (those in front cannot hear them). Upon hearing a question, each individual can choose whether or not to respond, and what information to relay back to the person who asked. After this, people don't remember the questions they were asked (so no information can move backwards in the line, only forwards). As individuals in the line gather information from these exchanges, they can use this information to formulate subsequent questions and provide answers. How this relates to transformer architecture: Each person in the line is a vector in the residual stream They start with just information about their own word (token embedding) and position in the line (positional embedding) The attention heads correspond to the questions that people in the line ask each other: Queries = question (which gets asked to everyone behind them in the line) Keys = how the people who hear the question decide whether or not to reply Values = the information that the people who reply pass back to the person who originally asked the question People can use information gained from earlier questions when answering / asking later questions - this is composition The MLPs correspond to the information processing / factual recall performed by each person in the sequence independently The unembedding at the end of the model is when we ask each person in the line for a final guess at what the next word is (in the form of a probability distribution over all possible words) Key Concepts for Transformers In...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An Analogy for Understanding Transformers, published by TheMcDouglas on May 13, 2023 on LessWrong. Thanks to the following people for feedback: Tilman Rauker, Curt Tigges, Rudolf Laine, Logan Smith, Arthur Conmy, Joseph Bloom, Rusheb Shah, James Dao. TL;DR I present an analogy for the transformer architecture: each vector in the residual stream is a person standing in a line, who is holding a token, and trying to guess what token the person in front of them is holding. Attention heads represent questions that people in this line can ask to everyone standing behind them (queries are the questions, keys determine who answers the questions, values determine what information gets passed back to the original question-asker), and MLPs represent the internal processing done by each person in the line. I claim this is a useful way to intuitively understand the transformer architecture, and I'll present several reasons for this (as well as ways induction heads and indirect object identification can be understood in these terms). Introduction In this post, I'm going to present an analogy for understanding how transformers work. I expect this to be useful for anyone who understands the basics of transformers, in particular people who have gone through Neel Nanda's tutorial, and/or understand the following points at a minimum: What a transformer's input is, what its outputs represent, and the nature of the predict-next-token task that it's trained on What the shape of the residual stream is, and the idea of components of the transformer reading from / writing to the residual stream throughout the model's layers How a transformer is composed of multiple blocks, each one containing an MLP (which does processing on vectors at individual sequence positions), and an attention layer (which moves information between the residual stream vectors at different sequence positions). I think the analogy still offers value even for people who understand transformers deeply already. The Analogy A line is formed by a group of people, each person holding a word. Everyone knows their own word and position in the line, but they can't see anyone else in the line. The objective for each person is to guess the word held by the person in front of them. People have the ability to shout questions to everyone standing behind them in the line (those in front cannot hear them). Upon hearing a question, each individual can choose whether or not to respond, and what information to relay back to the person who asked. After this, people don't remember the questions they were asked (so no information can move backwards in the line, only forwards). As individuals in the line gather information from these exchanges, they can use this information to formulate subsequent questions and provide answers. How this relates to transformer architecture: Each person in the line is a vector in the residual stream They start with just information about their own word (token embedding) and position in the line (positional embedding) The attention heads correspond to the questions that people in the line ask each other: Queries = question (which gets asked to everyone behind them in the line) Keys = how the people who hear the question decide whether or not to reply Values = the information that the people who reply pass back to the person who originally asked the question People can use information gained from earlier questions when answering / asking later questions - this is composition The MLPs correspond to the information processing / factual recall performed by each person in the sequence independently The unembedding at the end of the model is when we ask each person in the line for a final guess at what the next word is (in the form of a probability distribution over all possible words) Key Concepts for Transformers In...]]>
TheMcDouglas https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 16:57 None full 5942
EAwe7smpmFQi2653G_NL_LW_LW-week LW - My Assessment of the Chinese AI Safety Community by Lao Mein Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Assessment of the Chinese AI Safety Community, published by Lao Mein on April 25, 2023 on LessWrong. I've heard people be somewhat optimistic about this AI guideline from China. They think that this means Beijing is willing to participate in an AI disarmament treaty due to concerns over AI risk. Eliezer noted that China is where the US was a decade ago in regards to AI safety awareness, and expresses genuine hope that his ideas of an AI pause can take place with Chinese buy-in. I also note that no one expressing these views understands China well. This is a PR statement. It is a list of feel-good statements that Beijing publishes after any international event. No one in China is talking about it. They're talking about how much the Baidu LLM sucks in comparison to ChatGPT. I think most arguments about how this statement is meaningful are based fundamentally on ignorance - "I don't know how Beijing operates or thinks, so maybe they agree with my stance on AI risk!" Remember that these are regulatory guidelines. Even if they all become law and are strictly enforced, they are simply regulations on AI data usage and training. Not a signal that a willingness for an AI-reduction treaty is there. It is far more likely that Beijing sees near-term AI as a potential threat to stability that needs to be addressed with regulation. A domestic regulation framework for nuclear power is not a strong signal for a willingness to engage in nuclear arms reduction. Maybe it is true that AI risk in China is where it was in the US in 2004. But the US 2004 state was also similar to the US 1954 state, so the comparison might not mean that much. And we are not Americans. Weird ideas are penalized a lot more harshly here. Do you really think that a scientist is going to walk up to his friend from the Politburo and say "Hey, I know AI is a central priority of ours, but there are a few fringe scientists in the US asking for treaties limiting AI, right as they are doing their hardest to cripple our own AI development. Yes, I believe they are acting in good faith, they're even promising to not widen the current AI gap they have with us!" Well, China isn't in this race for parity or to be second best. China wants to win. But that's for another post. Remember that Chinese scientists are used to interfacing with our Western counterparts and know to say the right words like "diversity", "inclusion", and "no conflict of interest" that it takes to get our papers published. Just because someone at Beida makes a statement in one of their papers doesn't mean the intelligentsia is taking this seriously. I've looked through the EA/Rationalist/AI Safety forums in China, and they're mostly populated by expats or people physically outside of China. Most posts are in English, and they're just repeating/translating Western AI Safety concepts. A "moonshot idea" I saw brought up is getting Yudkowsky's Harry Potter fanfiction translated into Chinese (please never ever do this). The only significant AI safety group is Anyuan(), and they're only working on field-building. Also, there is only one group doing technical alignment work in China, the founder was paying for everything out of pocket and was unable to navigate Western non-profit funding. I've still not figured out why he wasn't getting funding from Chinese EA people (my theory is that both sides assume that if funding was needed, the other side would have already contacted them). You can't just hope an entire field into being in China. Chinese EAs have been doing field-building for the past 5+ years, and I see no field. If things keep on this trajectory, it will be the same in 5 more years. The main reason I could find is the lack of interfaces, people who can navigate both the Western EA sphere and the Chinese technical sphere. In many ways, the very conce...]]>
Lao Mein https://www.lesswrong.com/posts/EAwe7smpmFQi2653G/my-assessment-of-the-chinese-ai-safety-community Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Assessment of the Chinese AI Safety Community, published by Lao Mein on April 25, 2023 on LessWrong. I've heard people be somewhat optimistic about this AI guideline from China. They think that this means Beijing is willing to participate in an AI disarmament treaty due to concerns over AI risk. Eliezer noted that China is where the US was a decade ago in regards to AI safety awareness, and expresses genuine hope that his ideas of an AI pause can take place with Chinese buy-in. I also note that no one expressing these views understands China well. This is a PR statement. It is a list of feel-good statements that Beijing publishes after any international event. No one in China is talking about it. They're talking about how much the Baidu LLM sucks in comparison to ChatGPT. I think most arguments about how this statement is meaningful are based fundamentally on ignorance - "I don't know how Beijing operates or thinks, so maybe they agree with my stance on AI risk!" Remember that these are regulatory guidelines. Even if they all become law and are strictly enforced, they are simply regulations on AI data usage and training. Not a signal that a willingness for an AI-reduction treaty is there. It is far more likely that Beijing sees near-term AI as a potential threat to stability that needs to be addressed with regulation. A domestic regulation framework for nuclear power is not a strong signal for a willingness to engage in nuclear arms reduction. Maybe it is true that AI risk in China is where it was in the US in 2004. But the US 2004 state was also similar to the US 1954 state, so the comparison might not mean that much. And we are not Americans. Weird ideas are penalized a lot more harshly here. Do you really think that a scientist is going to walk up to his friend from the Politburo and say "Hey, I know AI is a central priority of ours, but there are a few fringe scientists in the US asking for treaties limiting AI, right as they are doing their hardest to cripple our own AI development. Yes, I believe they are acting in good faith, they're even promising to not widen the current AI gap they have with us!" Well, China isn't in this race for parity or to be second best. China wants to win. But that's for another post. Remember that Chinese scientists are used to interfacing with our Western counterparts and know to say the right words like "diversity", "inclusion", and "no conflict of interest" that it takes to get our papers published. Just because someone at Beida makes a statement in one of their papers doesn't mean the intelligentsia is taking this seriously. I've looked through the EA/Rationalist/AI Safety forums in China, and they're mostly populated by expats or people physically outside of China. Most posts are in English, and they're just repeating/translating Western AI Safety concepts. A "moonshot idea" I saw brought up is getting Yudkowsky's Harry Potter fanfiction translated into Chinese (please never ever do this). The only significant AI safety group is Anyuan(), and they're only working on field-building. Also, there is only one group doing technical alignment work in China, the founder was paying for everything out of pocket and was unable to navigate Western non-profit funding. I've still not figured out why he wasn't getting funding from Chinese EA people (my theory is that both sides assume that if funding was needed, the other side would have already contacted them). You can't just hope an entire field into being in China. Chinese EAs have been doing field-building for the past 5+ years, and I see no field. If things keep on this trajectory, it will be the same in 5 more years. The main reason I could find is the lack of interfaces, people who can navigate both the Western EA sphere and the Chinese technical sphere. In many ways, the very conce...]]>
Tue, 25 Apr 2023 05:33:58 +0000 LW - My Assessment of the Chinese AI Safety Community by Lao Mein Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Assessment of the Chinese AI Safety Community, published by Lao Mein on April 25, 2023 on LessWrong. I've heard people be somewhat optimistic about this AI guideline from China. They think that this means Beijing is willing to participate in an AI disarmament treaty due to concerns over AI risk. Eliezer noted that China is where the US was a decade ago in regards to AI safety awareness, and expresses genuine hope that his ideas of an AI pause can take place with Chinese buy-in. I also note that no one expressing these views understands China well. This is a PR statement. It is a list of feel-good statements that Beijing publishes after any international event. No one in China is talking about it. They're talking about how much the Baidu LLM sucks in comparison to ChatGPT. I think most arguments about how this statement is meaningful are based fundamentally on ignorance - "I don't know how Beijing operates or thinks, so maybe they agree with my stance on AI risk!" Remember that these are regulatory guidelines. Even if they all become law and are strictly enforced, they are simply regulations on AI data usage and training. Not a signal that a willingness for an AI-reduction treaty is there. It is far more likely that Beijing sees near-term AI as a potential threat to stability that needs to be addressed with regulation. A domestic regulation framework for nuclear power is not a strong signal for a willingness to engage in nuclear arms reduction. Maybe it is true that AI risk in China is where it was in the US in 2004. But the US 2004 state was also similar to the US 1954 state, so the comparison might not mean that much. And we are not Americans. Weird ideas are penalized a lot more harshly here. Do you really think that a scientist is going to walk up to his friend from the Politburo and say "Hey, I know AI is a central priority of ours, but there are a few fringe scientists in the US asking for treaties limiting AI, right as they are doing their hardest to cripple our own AI development. Yes, I believe they are acting in good faith, they're even promising to not widen the current AI gap they have with us!" Well, China isn't in this race for parity or to be second best. China wants to win. But that's for another post. Remember that Chinese scientists are used to interfacing with our Western counterparts and know to say the right words like "diversity", "inclusion", and "no conflict of interest" that it takes to get our papers published. Just because someone at Beida makes a statement in one of their papers doesn't mean the intelligentsia is taking this seriously. I've looked through the EA/Rationalist/AI Safety forums in China, and they're mostly populated by expats or people physically outside of China. Most posts are in English, and they're just repeating/translating Western AI Safety concepts. A "moonshot idea" I saw brought up is getting Yudkowsky's Harry Potter fanfiction translated into Chinese (please never ever do this). The only significant AI safety group is Anyuan(), and they're only working on field-building. Also, there is only one group doing technical alignment work in China, the founder was paying for everything out of pocket and was unable to navigate Western non-profit funding. I've still not figured out why he wasn't getting funding from Chinese EA people (my theory is that both sides assume that if funding was needed, the other side would have already contacted them). You can't just hope an entire field into being in China. Chinese EAs have been doing field-building for the past 5+ years, and I see no field. If things keep on this trajectory, it will be the same in 5 more years. The main reason I could find is the lack of interfaces, people who can navigate both the Western EA sphere and the Chinese technical sphere. In many ways, the very conce...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Assessment of the Chinese AI Safety Community, published by Lao Mein on April 25, 2023 on LessWrong. I've heard people be somewhat optimistic about this AI guideline from China. They think that this means Beijing is willing to participate in an AI disarmament treaty due to concerns over AI risk. Eliezer noted that China is where the US was a decade ago in regards to AI safety awareness, and expresses genuine hope that his ideas of an AI pause can take place with Chinese buy-in. I also note that no one expressing these views understands China well. This is a PR statement. It is a list of feel-good statements that Beijing publishes after any international event. No one in China is talking about it. They're talking about how much the Baidu LLM sucks in comparison to ChatGPT. I think most arguments about how this statement is meaningful are based fundamentally on ignorance - "I don't know how Beijing operates or thinks, so maybe they agree with my stance on AI risk!" Remember that these are regulatory guidelines. Even if they all become law and are strictly enforced, they are simply regulations on AI data usage and training. Not a signal that a willingness for an AI-reduction treaty is there. It is far more likely that Beijing sees near-term AI as a potential threat to stability that needs to be addressed with regulation. A domestic regulation framework for nuclear power is not a strong signal for a willingness to engage in nuclear arms reduction. Maybe it is true that AI risk in China is where it was in the US in 2004. But the US 2004 state was also similar to the US 1954 state, so the comparison might not mean that much. And we are not Americans. Weird ideas are penalized a lot more harshly here. Do you really think that a scientist is going to walk up to his friend from the Politburo and say "Hey, I know AI is a central priority of ours, but there are a few fringe scientists in the US asking for treaties limiting AI, right as they are doing their hardest to cripple our own AI development. Yes, I believe they are acting in good faith, they're even promising to not widen the current AI gap they have with us!" Well, China isn't in this race for parity or to be second best. China wants to win. But that's for another post. Remember that Chinese scientists are used to interfacing with our Western counterparts and know to say the right words like "diversity", "inclusion", and "no conflict of interest" that it takes to get our papers published. Just because someone at Beida makes a statement in one of their papers doesn't mean the intelligentsia is taking this seriously. I've looked through the EA/Rationalist/AI Safety forums in China, and they're mostly populated by expats or people physically outside of China. Most posts are in English, and they're just repeating/translating Western AI Safety concepts. A "moonshot idea" I saw brought up is getting Yudkowsky's Harry Potter fanfiction translated into Chinese (please never ever do this). The only significant AI safety group is Anyuan(), and they're only working on field-building. Also, there is only one group doing technical alignment work in China, the founder was paying for everything out of pocket and was unable to navigate Western non-profit funding. I've still not figured out why he wasn't getting funding from Chinese EA people (my theory is that both sides assume that if funding was needed, the other side would have already contacted them). You can't just hope an entire field into being in China. Chinese EAs have been doing field-building for the past 5+ years, and I see no field. If things keep on this trajectory, it will be the same in 5 more years. The main reason I could find is the lack of interfaces, people who can navigate both the Western EA sphere and the Chinese technical sphere. In many ways, the very conce...]]>
Lao Mein https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 04:09 None full 5720
eaDCgdkbsfGqpWazi_NL_LW_LW-week LW - The basic reasons I expect AGI ruin by Rob Bensinger Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The basic reasons I expect AGI ruin, published by Rob Bensinger on April 18, 2023 on LessWrong. I've been citing AGI Ruin: A List of Lethalities to explain why the situation with AI looks lethally dangerous to me. But that post is relatively long, and emphasizes specific open technical problems over "the basics". Here are 10 things I'd focus on if I were giving "the basics" on why I'm so worried: 1. General intelligence is very powerful, and once we can build it at all, STEM-capable artificial general intelligence (AGI) is likely to vastly outperform human intelligence immediately (or very quickly). When I say "general intelligence", I'm usually thinking about "whatever it is that lets human brains do astrophysics, category theory, etc. even though our brains evolved under literally zero selection pressure to solve astrophysics or category theory problems". It's possible that we should already be thinking of GPT-4 as "AGI" on some definitions, so to be clear about the threshold of generality I have in mind, I'll specifically talk about "STEM-level AGI", though I expect such systems to be good at non-STEM tasks too. Human brains aren't perfectly general, and not all narrow AI systems or animals are equally narrow. (E.g., AlphaZero is more general than AlphaGo.) But it sure is interesting that humans evolved cognitive abilities that unlock all of these sciences at once, with zero evolutionary fine-tuning of the brain aimed at equipping us for any of those sciences. Evolution just stumbled into a solution to other problems, that happened to generalize to millions of wildly novel tasks. More concretely: AlphaGo is a very impressive reasoner, but its hypothesis space is limited to sequences of Go board states rather than sequences of states of the physical universe. Efficiently reasoning about the physical universe requires solving at least some problems that are different in kind from what AlphaGo solves. These problems might be solved by the STEM AGI's programmer, and/or solved by the algorithm that finds the AGI in program-space; and some such problems may be solved by the AGI itself in the course of refining its thinking. Some examples of abilities I expect humans to only automate once we've built STEM-level AGI (if ever): The ability to perform open-heart surgery with a high success rate, in a messy non-standardized ordinary surgical environment. The ability to match smart human performance in a specific hard science field, across all the scientific work humans do in that field. In principle, I suspect you could build a narrow system that is good at those tasks while lacking the basic mental machinery required to do par-human reasoning about all the hard sciences. In practice, I very strongly expect humans to find ways to build general reasoners to perform those tasks, before we figure out how to build narrow reasoners that can do them. (For the same basic reason evolution stumbled on general intelligence so early in the history of human tech development.) When I say "general intelligence is very powerful", a lot of what I mean is that science is very powerful, and that having all of the sciences at once is a lot more powerful than the sum of each science's impact. Another large piece of what I mean is that (STEM-level) general intelligence is a very high-impact sort of thing to automate because STEM-level AGI is likely to blow human intelligence out of the water immediately, or very soon after its invention. 80,000 Hours gives the (non-representative) example of how AlphaGo and its successors compared to the humanity: In the span of a year, AI had advanced from being too weak to win a single [Go] match against the worst human professionals, to being impossible for even the best players in the world to defeat. I expect general-purpose science AI to blow human science...]]>
Rob Bensinger https://www.lesswrong.com/posts/eaDCgdkbsfGqpWazi/the-basic-reasons-i-expect-agi-ruin Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The basic reasons I expect AGI ruin, published by Rob Bensinger on April 18, 2023 on LessWrong. I've been citing AGI Ruin: A List of Lethalities to explain why the situation with AI looks lethally dangerous to me. But that post is relatively long, and emphasizes specific open technical problems over "the basics". Here are 10 things I'd focus on if I were giving "the basics" on why I'm so worried: 1. General intelligence is very powerful, and once we can build it at all, STEM-capable artificial general intelligence (AGI) is likely to vastly outperform human intelligence immediately (or very quickly). When I say "general intelligence", I'm usually thinking about "whatever it is that lets human brains do astrophysics, category theory, etc. even though our brains evolved under literally zero selection pressure to solve astrophysics or category theory problems". It's possible that we should already be thinking of GPT-4 as "AGI" on some definitions, so to be clear about the threshold of generality I have in mind, I'll specifically talk about "STEM-level AGI", though I expect such systems to be good at non-STEM tasks too. Human brains aren't perfectly general, and not all narrow AI systems or animals are equally narrow. (E.g., AlphaZero is more general than AlphaGo.) But it sure is interesting that humans evolved cognitive abilities that unlock all of these sciences at once, with zero evolutionary fine-tuning of the brain aimed at equipping us for any of those sciences. Evolution just stumbled into a solution to other problems, that happened to generalize to millions of wildly novel tasks. More concretely: AlphaGo is a very impressive reasoner, but its hypothesis space is limited to sequences of Go board states rather than sequences of states of the physical universe. Efficiently reasoning about the physical universe requires solving at least some problems that are different in kind from what AlphaGo solves. These problems might be solved by the STEM AGI's programmer, and/or solved by the algorithm that finds the AGI in program-space; and some such problems may be solved by the AGI itself in the course of refining its thinking. Some examples of abilities I expect humans to only automate once we've built STEM-level AGI (if ever): The ability to perform open-heart surgery with a high success rate, in a messy non-standardized ordinary surgical environment. The ability to match smart human performance in a specific hard science field, across all the scientific work humans do in that field. In principle, I suspect you could build a narrow system that is good at those tasks while lacking the basic mental machinery required to do par-human reasoning about all the hard sciences. In practice, I very strongly expect humans to find ways to build general reasoners to perform those tasks, before we figure out how to build narrow reasoners that can do them. (For the same basic reason evolution stumbled on general intelligence so early in the history of human tech development.) When I say "general intelligence is very powerful", a lot of what I mean is that science is very powerful, and that having all of the sciences at once is a lot more powerful than the sum of each science's impact. Another large piece of what I mean is that (STEM-level) general intelligence is a very high-impact sort of thing to automate because STEM-level AGI is likely to blow human intelligence out of the water immediately, or very soon after its invention. 80,000 Hours gives the (non-representative) example of how AlphaGo and its successors compared to the humanity: In the span of a year, AI had advanced from being too weak to win a single [Go] match against the worst human professionals, to being impossible for even the best players in the world to defeat. I expect general-purpose science AI to blow human science...]]>
Tue, 18 Apr 2023 04:47:43 +0000 LW - The basic reasons I expect AGI ruin by Rob Bensinger Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The basic reasons I expect AGI ruin, published by Rob Bensinger on April 18, 2023 on LessWrong. I've been citing AGI Ruin: A List of Lethalities to explain why the situation with AI looks lethally dangerous to me. But that post is relatively long, and emphasizes specific open technical problems over "the basics". Here are 10 things I'd focus on if I were giving "the basics" on why I'm so worried: 1. General intelligence is very powerful, and once we can build it at all, STEM-capable artificial general intelligence (AGI) is likely to vastly outperform human intelligence immediately (or very quickly). When I say "general intelligence", I'm usually thinking about "whatever it is that lets human brains do astrophysics, category theory, etc. even though our brains evolved under literally zero selection pressure to solve astrophysics or category theory problems". It's possible that we should already be thinking of GPT-4 as "AGI" on some definitions, so to be clear about the threshold of generality I have in mind, I'll specifically talk about "STEM-level AGI", though I expect such systems to be good at non-STEM tasks too. Human brains aren't perfectly general, and not all narrow AI systems or animals are equally narrow. (E.g., AlphaZero is more general than AlphaGo.) But it sure is interesting that humans evolved cognitive abilities that unlock all of these sciences at once, with zero evolutionary fine-tuning of the brain aimed at equipping us for any of those sciences. Evolution just stumbled into a solution to other problems, that happened to generalize to millions of wildly novel tasks. More concretely: AlphaGo is a very impressive reasoner, but its hypothesis space is limited to sequences of Go board states rather than sequences of states of the physical universe. Efficiently reasoning about the physical universe requires solving at least some problems that are different in kind from what AlphaGo solves. These problems might be solved by the STEM AGI's programmer, and/or solved by the algorithm that finds the AGI in program-space; and some such problems may be solved by the AGI itself in the course of refining its thinking. Some examples of abilities I expect humans to only automate once we've built STEM-level AGI (if ever): The ability to perform open-heart surgery with a high success rate, in a messy non-standardized ordinary surgical environment. The ability to match smart human performance in a specific hard science field, across all the scientific work humans do in that field. In principle, I suspect you could build a narrow system that is good at those tasks while lacking the basic mental machinery required to do par-human reasoning about all the hard sciences. In practice, I very strongly expect humans to find ways to build general reasoners to perform those tasks, before we figure out how to build narrow reasoners that can do them. (For the same basic reason evolution stumbled on general intelligence so early in the history of human tech development.) When I say "general intelligence is very powerful", a lot of what I mean is that science is very powerful, and that having all of the sciences at once is a lot more powerful than the sum of each science's impact. Another large piece of what I mean is that (STEM-level) general intelligence is a very high-impact sort of thing to automate because STEM-level AGI is likely to blow human intelligence out of the water immediately, or very soon after its invention. 80,000 Hours gives the (non-representative) example of how AlphaGo and its successors compared to the humanity: In the span of a year, AI had advanced from being too weak to win a single [Go] match against the worst human professionals, to being impossible for even the best players in the world to defeat. I expect general-purpose science AI to blow human science...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The basic reasons I expect AGI ruin, published by Rob Bensinger on April 18, 2023 on LessWrong. I've been citing AGI Ruin: A List of Lethalities to explain why the situation with AI looks lethally dangerous to me. But that post is relatively long, and emphasizes specific open technical problems over "the basics". Here are 10 things I'd focus on if I were giving "the basics" on why I'm so worried: 1. General intelligence is very powerful, and once we can build it at all, STEM-capable artificial general intelligence (AGI) is likely to vastly outperform human intelligence immediately (or very quickly). When I say "general intelligence", I'm usually thinking about "whatever it is that lets human brains do astrophysics, category theory, etc. even though our brains evolved under literally zero selection pressure to solve astrophysics or category theory problems". It's possible that we should already be thinking of GPT-4 as "AGI" on some definitions, so to be clear about the threshold of generality I have in mind, I'll specifically talk about "STEM-level AGI", though I expect such systems to be good at non-STEM tasks too. Human brains aren't perfectly general, and not all narrow AI systems or animals are equally narrow. (E.g., AlphaZero is more general than AlphaGo.) But it sure is interesting that humans evolved cognitive abilities that unlock all of these sciences at once, with zero evolutionary fine-tuning of the brain aimed at equipping us for any of those sciences. Evolution just stumbled into a solution to other problems, that happened to generalize to millions of wildly novel tasks. More concretely: AlphaGo is a very impressive reasoner, but its hypothesis space is limited to sequences of Go board states rather than sequences of states of the physical universe. Efficiently reasoning about the physical universe requires solving at least some problems that are different in kind from what AlphaGo solves. These problems might be solved by the STEM AGI's programmer, and/or solved by the algorithm that finds the AGI in program-space; and some such problems may be solved by the AGI itself in the course of refining its thinking. Some examples of abilities I expect humans to only automate once we've built STEM-level AGI (if ever): The ability to perform open-heart surgery with a high success rate, in a messy non-standardized ordinary surgical environment. The ability to match smart human performance in a specific hard science field, across all the scientific work humans do in that field. In principle, I suspect you could build a narrow system that is good at those tasks while lacking the basic mental machinery required to do par-human reasoning about all the hard sciences. In practice, I very strongly expect humans to find ways to build general reasoners to perform those tasks, before we figure out how to build narrow reasoners that can do them. (For the same basic reason evolution stumbled on general intelligence so early in the history of human tech development.) When I say "general intelligence is very powerful", a lot of what I mean is that science is very powerful, and that having all of the sciences at once is a lot more powerful than the sum of each science's impact. Another large piece of what I mean is that (STEM-level) general intelligence is a very high-impact sort of thing to automate because STEM-level AGI is likely to blow human intelligence out of the water immediately, or very soon after its invention. 80,000 Hours gives the (non-representative) example of how AlphaGo and its successors compared to the humanity: In the span of a year, AI had advanced from being too weak to win a single [Go] match against the worst human professionals, to being impossible for even the best players in the world to defeat. I expect general-purpose science AI to blow human science...]]>
Rob Bensinger https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 33:48 None full 5644
566kBoPi76t8KAkoD_NL_LW_LW-week LW - On AutoGPT by Zvi Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On AutoGPT, published by Zvi on April 13, 2023 on LessWrong. The primary talk of the AI world recently is about AI agents (whether or not it includes the question of whether we can’t help but notice we are all going to die.) The trigger for this was AutoGPT, now number one on GitHub, which allows you to turn GPT-4 (or GPT-3.5 for us clowns without proper access) into a prototype version of a self-directed agent. We also have a paper out this week where a simple virtual world was created, populated by LLMs that were wrapped in code designed to make them simple agents, and then several days of activity were simulated, during which the AI inhabitants interacted, formed and executed plans, and it all seemed like the beginnings of a living and dynamic world. Game version hopefully coming soon. How should we think about this? How worried should we be? The Basics I’ll reiterate the basics of what AutoGPT is, for those who need that, others can skip ahead. I talked briefly about this in AI#6 under the heading ‘Your AI Not an Agent? There, I Fixed It.’ AutoGPT was created by game designer Toran Bruce Richards. I previously incorrectly understood it as having been created by a non-coding VC over the course of a few days. The VC instead coded the similar program BabyGPT, by having the idea for how to turn GPT-4 into an agent. The VC had GPT-4 write the code to make this happen, and also ‘write the paper’ associated with it. The concept works like this: AutoGPT uses GPT-4 to generate, prioritize and execute tasks, using plug-ins for internet browsing and other access. It uses outside memory to keep track of what it is doing and provide context, which lets it evaluate its situation, generate new tasks or self-correct, and add new tasks to the queue, which it then prioritizes. This quickly rose to become #1 on GitHub and get lots of people super excited. People are excited, people are building it tools, there is a bitcoin wallet interaction available if you never liked your bitcoins. AI agents offer very obvious promise, both in terms of mundane utility via being able to create and execute multi-step plans to do your market research and anything else you might want, and in terms of potentially being a path to AGI and getting us all killed, either with GPT-4 or a future model. As with all such new developments, we have people saying it was inevitable and they knew it would happen all along, and others that are surprised. We have people excited by future possibilities, others not impressed because the current versions haven’t done much. Some see the potential, others the potential for big trouble, others both. Also as per standard procedure, we should expect rapid improvements over time, both in terms of usability and underlying capabilities. There are any number of obvious low-hanging-fruit improvements available. An example is someone noting ‘you have to keep an eye on it to ensure it is not caught in a loop.’ That’s easy enough to fix. A common complaint is lack of focus and tendency to end up distracted. Again, the obvious things have not been tried to mitigate this. We don’t know how effective they will be, but no doubt they will at least help somewhat. Yes, But What Has Auto-GPT Actually Accomplished? So far? Nothing, absolutely nothing, stupid, you so stupid. You can say your ‘mind is blown’ by all the developments of the past 24 hours all you want over and over, it still does not net out into having accomplished much of anything. That’s not quite fair. Some people are reporting it has been useful as a way of generating market research, that it is good at this and faster than using the traditional GPT-4 or Bing interfaces. I saw a claim that it can have ‘complex conversations with customers,’ or a few other vague similar claims that weren’t backed up by ‘we are totally actual...]]>
Zvi https://www.lesswrong.com/posts/566kBoPi76t8KAkoD/on-autogpt Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On AutoGPT, published by Zvi on April 13, 2023 on LessWrong. The primary talk of the AI world recently is about AI agents (whether or not it includes the question of whether we can’t help but notice we are all going to die.) The trigger for this was AutoGPT, now number one on GitHub, which allows you to turn GPT-4 (or GPT-3.5 for us clowns without proper access) into a prototype version of a self-directed agent. We also have a paper out this week where a simple virtual world was created, populated by LLMs that were wrapped in code designed to make them simple agents, and then several days of activity were simulated, during which the AI inhabitants interacted, formed and executed plans, and it all seemed like the beginnings of a living and dynamic world. Game version hopefully coming soon. How should we think about this? How worried should we be? The Basics I’ll reiterate the basics of what AutoGPT is, for those who need that, others can skip ahead. I talked briefly about this in AI#6 under the heading ‘Your AI Not an Agent? There, I Fixed It.’ AutoGPT was created by game designer Toran Bruce Richards. I previously incorrectly understood it as having been created by a non-coding VC over the course of a few days. The VC instead coded the similar program BabyGPT, by having the idea for how to turn GPT-4 into an agent. The VC had GPT-4 write the code to make this happen, and also ‘write the paper’ associated with it. The concept works like this: AutoGPT uses GPT-4 to generate, prioritize and execute tasks, using plug-ins for internet browsing and other access. It uses outside memory to keep track of what it is doing and provide context, which lets it evaluate its situation, generate new tasks or self-correct, and add new tasks to the queue, which it then prioritizes. This quickly rose to become #1 on GitHub and get lots of people super excited. People are excited, people are building it tools, there is a bitcoin wallet interaction available if you never liked your bitcoins. AI agents offer very obvious promise, both in terms of mundane utility via being able to create and execute multi-step plans to do your market research and anything else you might want, and in terms of potentially being a path to AGI and getting us all killed, either with GPT-4 or a future model. As with all such new developments, we have people saying it was inevitable and they knew it would happen all along, and others that are surprised. We have people excited by future possibilities, others not impressed because the current versions haven’t done much. Some see the potential, others the potential for big trouble, others both. Also as per standard procedure, we should expect rapid improvements over time, both in terms of usability and underlying capabilities. There are any number of obvious low-hanging-fruit improvements available. An example is someone noting ‘you have to keep an eye on it to ensure it is not caught in a loop.’ That’s easy enough to fix. A common complaint is lack of focus and tendency to end up distracted. Again, the obvious things have not been tried to mitigate this. We don’t know how effective they will be, but no doubt they will at least help somewhat. Yes, But What Has Auto-GPT Actually Accomplished? So far? Nothing, absolutely nothing, stupid, you so stupid. You can say your ‘mind is blown’ by all the developments of the past 24 hours all you want over and over, it still does not net out into having accomplished much of anything. That’s not quite fair. Some people are reporting it has been useful as a way of generating market research, that it is good at this and faster than using the traditional GPT-4 or Bing interfaces. I saw a claim that it can have ‘complex conversations with customers,’ or a few other vague similar claims that weren’t backed up by ‘we are totally actual...]]>
Thu, 13 Apr 2023 15:52:46 +0000 LW - On AutoGPT by Zvi Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On AutoGPT, published by Zvi on April 13, 2023 on LessWrong. The primary talk of the AI world recently is about AI agents (whether or not it includes the question of whether we can’t help but notice we are all going to die.) The trigger for this was AutoGPT, now number one on GitHub, which allows you to turn GPT-4 (or GPT-3.5 for us clowns without proper access) into a prototype version of a self-directed agent. We also have a paper out this week where a simple virtual world was created, populated by LLMs that were wrapped in code designed to make them simple agents, and then several days of activity were simulated, during which the AI inhabitants interacted, formed and executed plans, and it all seemed like the beginnings of a living and dynamic world. Game version hopefully coming soon. How should we think about this? How worried should we be? The Basics I’ll reiterate the basics of what AutoGPT is, for those who need that, others can skip ahead. I talked briefly about this in AI#6 under the heading ‘Your AI Not an Agent? There, I Fixed It.’ AutoGPT was created by game designer Toran Bruce Richards. I previously incorrectly understood it as having been created by a non-coding VC over the course of a few days. The VC instead coded the similar program BabyGPT, by having the idea for how to turn GPT-4 into an agent. The VC had GPT-4 write the code to make this happen, and also ‘write the paper’ associated with it. The concept works like this: AutoGPT uses GPT-4 to generate, prioritize and execute tasks, using plug-ins for internet browsing and other access. It uses outside memory to keep track of what it is doing and provide context, which lets it evaluate its situation, generate new tasks or self-correct, and add new tasks to the queue, which it then prioritizes. This quickly rose to become #1 on GitHub and get lots of people super excited. People are excited, people are building it tools, there is a bitcoin wallet interaction available if you never liked your bitcoins. AI agents offer very obvious promise, both in terms of mundane utility via being able to create and execute multi-step plans to do your market research and anything else you might want, and in terms of potentially being a path to AGI and getting us all killed, either with GPT-4 or a future model. As with all such new developments, we have people saying it was inevitable and they knew it would happen all along, and others that are surprised. We have people excited by future possibilities, others not impressed because the current versions haven’t done much. Some see the potential, others the potential for big trouble, others both. Also as per standard procedure, we should expect rapid improvements over time, both in terms of usability and underlying capabilities. There are any number of obvious low-hanging-fruit improvements available. An example is someone noting ‘you have to keep an eye on it to ensure it is not caught in a loop.’ That’s easy enough to fix. A common complaint is lack of focus and tendency to end up distracted. Again, the obvious things have not been tried to mitigate this. We don’t know how effective they will be, but no doubt they will at least help somewhat. Yes, But What Has Auto-GPT Actually Accomplished? So far? Nothing, absolutely nothing, stupid, you so stupid. You can say your ‘mind is blown’ by all the developments of the past 24 hours all you want over and over, it still does not net out into having accomplished much of anything. That’s not quite fair. Some people are reporting it has been useful as a way of generating market research, that it is good at this and faster than using the traditional GPT-4 or Bing interfaces. I saw a claim that it can have ‘complex conversations with customers,’ or a few other vague similar claims that weren’t backed up by ‘we are totally actual...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On AutoGPT, published by Zvi on April 13, 2023 on LessWrong. The primary talk of the AI world recently is about AI agents (whether or not it includes the question of whether we can’t help but notice we are all going to die.) The trigger for this was AutoGPT, now number one on GitHub, which allows you to turn GPT-4 (or GPT-3.5 for us clowns without proper access) into a prototype version of a self-directed agent. We also have a paper out this week where a simple virtual world was created, populated by LLMs that were wrapped in code designed to make them simple agents, and then several days of activity were simulated, during which the AI inhabitants interacted, formed and executed plans, and it all seemed like the beginnings of a living and dynamic world. Game version hopefully coming soon. How should we think about this? How worried should we be? The Basics I’ll reiterate the basics of what AutoGPT is, for those who need that, others can skip ahead. I talked briefly about this in AI#6 under the heading ‘Your AI Not an Agent? There, I Fixed It.’ AutoGPT was created by game designer Toran Bruce Richards. I previously incorrectly understood it as having been created by a non-coding VC over the course of a few days. The VC instead coded the similar program BabyGPT, by having the idea for how to turn GPT-4 into an agent. The VC had GPT-4 write the code to make this happen, and also ‘write the paper’ associated with it. The concept works like this: AutoGPT uses GPT-4 to generate, prioritize and execute tasks, using plug-ins for internet browsing and other access. It uses outside memory to keep track of what it is doing and provide context, which lets it evaluate its situation, generate new tasks or self-correct, and add new tasks to the queue, which it then prioritizes. This quickly rose to become #1 on GitHub and get lots of people super excited. People are excited, people are building it tools, there is a bitcoin wallet interaction available if you never liked your bitcoins. AI agents offer very obvious promise, both in terms of mundane utility via being able to create and execute multi-step plans to do your market research and anything else you might want, and in terms of potentially being a path to AGI and getting us all killed, either with GPT-4 or a future model. As with all such new developments, we have people saying it was inevitable and they knew it would happen all along, and others that are surprised. We have people excited by future possibilities, others not impressed because the current versions haven’t done much. Some see the potential, others the potential for big trouble, others both. Also as per standard procedure, we should expect rapid improvements over time, both in terms of usability and underlying capabilities. There are any number of obvious low-hanging-fruit improvements available. An example is someone noting ‘you have to keep an eye on it to ensure it is not caught in a loop.’ That’s easy enough to fix. A common complaint is lack of focus and tendency to end up distracted. Again, the obvious things have not been tried to mitigate this. We don’t know how effective they will be, but no doubt they will at least help somewhat. Yes, But What Has Auto-GPT Actually Accomplished? So far? Nothing, absolutely nothing, stupid, you so stupid. You can say your ‘mind is blown’ by all the developments of the past 24 hours all you want over and over, it still does not net out into having accomplished much of anything. That’s not quite fair. Some people are reporting it has been useful as a way of generating market research, that it is good at this and faster than using the traditional GPT-4 or Bing interfaces. I saw a claim that it can have ‘complex conversations with customers,’ or a few other vague similar claims that weren’t backed up by ‘we are totally actual...]]>
Zvi https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 30:14 None full 5584
XWwvwytieLtEWaFJX_NL_LW_LW-week LW - Deep Deceptiveness by So8res Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Deep Deceptiveness, published by So8res on March 21, 2023 on LessWrong. Meta This post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don't recognize it) in the recent plans written up by OpenAI (1,2), by DeepMind’s alignment team, or by Anthropic, and I know of no other acknowledgment of this issue by major labs. You could think of this as a fragment of my answer to “Where do plans like OpenAI’s ‘Our Approach to Alignment Research’ fail?”, as discussed in Rob and Eliezer’s challenge for AGI organizations and readers. Note that it would only be a fragment of the reply; there's a lot more to say about why AI alignment is a particularly tricky task to task an AI with. (Some of which Eliezer gestures at in a follow-up to his interview on Bankless.) Caveat: I'll be talking a bunch about “deception” in this post because this post was generated as a result of conversations I had with alignment researchers at big labs who seemed to me to be suggesting "just train AI to not be deceptive; there's a decent chance that works". I have a vague impression that others in the community think that deception in particular is much more central than I think it is, so I want to warn against that interpretation here: I think deception is an important problem, but its main importance is as an example of some broader issues in alignment. Caveat: I haven't checked the relationship between my use of the word 'deception' here, and the use of the word 'deceptive' in discussions of "deceptive alignment". Please don't assume that the two words mean the same thing. Investigating a made-up but moderately concrete story Suppose you have a nascent AGI, and you've been training against all hints of deceptiveness. What goes wrong? When I ask this question of people who are optimistic that we can just "train AIs not to be deceptive", there are a few answers that seem well-known. Perhaps you lack the interpretability tools to correctly identify the precursors of 'deception', so that you can only train against visibly deceptive AI outputs instead of AI thoughts about how to plan deceptions. Or perhaps training against interpreted deceptive thoughts also trains against your interpretability tools, and your AI becomes illegibly deceptive rather than non-deceptive. And these are both real obstacles. But there are deeper obstacles, that seem to me more central, and that I haven't observed others to notice on their own. That's a challenge, and while you (hopefully) chew on it, I'll tell an implausibly-detailed story to exemplify a deeper obstacle. A fledgeling AI is being deployed towards building something like a bacterium, but with a diamondoid shell. The diamondoid-shelled bacterium is not intended to be pivotal, but it's a supposedly laboratory-verifiable step on a path towards carrying out some speculative human-brain-enhancement operations, which the operators are hoping will be pivotal. (The original hope was to have the AI assist human engineers, but the first versions that were able to do the hard parts of engineering work at all were able to go much farther on their own, and the competition is close enough behind that the developers claim they had no choice but to see how far they could take it.) We’ll suppose the AI has already been gradient-descent-trained against deceptive outputs, and has internally ended up with internal mechanisms that detect and shut down the precursors of deceptive thinking. Here, I’ll offer a concrete visualization of the AI’s anthropomorphized "threads of deliberation" as the AI fumbles its way both towards deceptiveness, and towards noticing its inability to directly consider deceptiveness. The AI is working with a human-operated wetlab (biology lab) and s...]]>
So8res https://www.lesswrong.com/posts/XWwvwytieLtEWaFJX/deep-deceptiveness Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Deep Deceptiveness, published by So8res on March 21, 2023 on LessWrong. Meta This post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don't recognize it) in the recent plans written up by OpenAI (1,2), by DeepMind’s alignment team, or by Anthropic, and I know of no other acknowledgment of this issue by major labs. You could think of this as a fragment of my answer to “Where do plans like OpenAI’s ‘Our Approach to Alignment Research’ fail?”, as discussed in Rob and Eliezer’s challenge for AGI organizations and readers. Note that it would only be a fragment of the reply; there's a lot more to say about why AI alignment is a particularly tricky task to task an AI with. (Some of which Eliezer gestures at in a follow-up to his interview on Bankless.) Caveat: I'll be talking a bunch about “deception” in this post because this post was generated as a result of conversations I had with alignment researchers at big labs who seemed to me to be suggesting "just train AI to not be deceptive; there's a decent chance that works". I have a vague impression that others in the community think that deception in particular is much more central than I think it is, so I want to warn against that interpretation here: I think deception is an important problem, but its main importance is as an example of some broader issues in alignment. Caveat: I haven't checked the relationship between my use of the word 'deception' here, and the use of the word 'deceptive' in discussions of "deceptive alignment". Please don't assume that the two words mean the same thing. Investigating a made-up but moderately concrete story Suppose you have a nascent AGI, and you've been training against all hints of deceptiveness. What goes wrong? When I ask this question of people who are optimistic that we can just "train AIs not to be deceptive", there are a few answers that seem well-known. Perhaps you lack the interpretability tools to correctly identify the precursors of 'deception', so that you can only train against visibly deceptive AI outputs instead of AI thoughts about how to plan deceptions. Or perhaps training against interpreted deceptive thoughts also trains against your interpretability tools, and your AI becomes illegibly deceptive rather than non-deceptive. And these are both real obstacles. But there are deeper obstacles, that seem to me more central, and that I haven't observed others to notice on their own. That's a challenge, and while you (hopefully) chew on it, I'll tell an implausibly-detailed story to exemplify a deeper obstacle. A fledgeling AI is being deployed towards building something like a bacterium, but with a diamondoid shell. The diamondoid-shelled bacterium is not intended to be pivotal, but it's a supposedly laboratory-verifiable step on a path towards carrying out some speculative human-brain-enhancement operations, which the operators are hoping will be pivotal. (The original hope was to have the AI assist human engineers, but the first versions that were able to do the hard parts of engineering work at all were able to go much farther on their own, and the competition is close enough behind that the developers claim they had no choice but to see how far they could take it.) We’ll suppose the AI has already been gradient-descent-trained against deceptive outputs, and has internally ended up with internal mechanisms that detect and shut down the precursors of deceptive thinking. Here, I’ll offer a concrete visualization of the AI’s anthropomorphized "threads of deliberation" as the AI fumbles its way both towards deceptiveness, and towards noticing its inability to directly consider deceptiveness. The AI is working with a human-operated wetlab (biology lab) and s...]]>
Tue, 21 Mar 2023 03:23:39 +0000 LW - Deep Deceptiveness by So8res Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Deep Deceptiveness, published by So8res on March 21, 2023 on LessWrong. Meta This post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don't recognize it) in the recent plans written up by OpenAI (1,2), by DeepMind’s alignment team, or by Anthropic, and I know of no other acknowledgment of this issue by major labs. You could think of this as a fragment of my answer to “Where do plans like OpenAI’s ‘Our Approach to Alignment Research’ fail?”, as discussed in Rob and Eliezer’s challenge for AGI organizations and readers. Note that it would only be a fragment of the reply; there's a lot more to say about why AI alignment is a particularly tricky task to task an AI with. (Some of which Eliezer gestures at in a follow-up to his interview on Bankless.) Caveat: I'll be talking a bunch about “deception” in this post because this post was generated as a result of conversations I had with alignment researchers at big labs who seemed to me to be suggesting "just train AI to not be deceptive; there's a decent chance that works". I have a vague impression that others in the community think that deception in particular is much more central than I think it is, so I want to warn against that interpretation here: I think deception is an important problem, but its main importance is as an example of some broader issues in alignment. Caveat: I haven't checked the relationship between my use of the word 'deception' here, and the use of the word 'deceptive' in discussions of "deceptive alignment". Please don't assume that the two words mean the same thing. Investigating a made-up but moderately concrete story Suppose you have a nascent AGI, and you've been training against all hints of deceptiveness. What goes wrong? When I ask this question of people who are optimistic that we can just "train AIs not to be deceptive", there are a few answers that seem well-known. Perhaps you lack the interpretability tools to correctly identify the precursors of 'deception', so that you can only train against visibly deceptive AI outputs instead of AI thoughts about how to plan deceptions. Or perhaps training against interpreted deceptive thoughts also trains against your interpretability tools, and your AI becomes illegibly deceptive rather than non-deceptive. And these are both real obstacles. But there are deeper obstacles, that seem to me more central, and that I haven't observed others to notice on their own. That's a challenge, and while you (hopefully) chew on it, I'll tell an implausibly-detailed story to exemplify a deeper obstacle. A fledgeling AI is being deployed towards building something like a bacterium, but with a diamondoid shell. The diamondoid-shelled bacterium is not intended to be pivotal, but it's a supposedly laboratory-verifiable step on a path towards carrying out some speculative human-brain-enhancement operations, which the operators are hoping will be pivotal. (The original hope was to have the AI assist human engineers, but the first versions that were able to do the hard parts of engineering work at all were able to go much farther on their own, and the competition is close enough behind that the developers claim they had no choice but to see how far they could take it.) We’ll suppose the AI has already been gradient-descent-trained against deceptive outputs, and has internally ended up with internal mechanisms that detect and shut down the precursors of deceptive thinking. Here, I’ll offer a concrete visualization of the AI’s anthropomorphized "threads of deliberation" as the AI fumbles its way both towards deceptiveness, and towards noticing its inability to directly consider deceptiveness. The AI is working with a human-operated wetlab (biology lab) and s...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Deep Deceptiveness, published by So8res on March 21, 2023 on LessWrong. Meta This post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don't recognize it) in the recent plans written up by OpenAI (1,2), by DeepMind’s alignment team, or by Anthropic, and I know of no other acknowledgment of this issue by major labs. You could think of this as a fragment of my answer to “Where do plans like OpenAI’s ‘Our Approach to Alignment Research’ fail?”, as discussed in Rob and Eliezer’s challenge for AGI organizations and readers. Note that it would only be a fragment of the reply; there's a lot more to say about why AI alignment is a particularly tricky task to task an AI with. (Some of which Eliezer gestures at in a follow-up to his interview on Bankless.) Caveat: I'll be talking a bunch about “deception” in this post because this post was generated as a result of conversations I had with alignment researchers at big labs who seemed to me to be suggesting "just train AI to not be deceptive; there's a decent chance that works". I have a vague impression that others in the community think that deception in particular is much more central than I think it is, so I want to warn against that interpretation here: I think deception is an important problem, but its main importance is as an example of some broader issues in alignment. Caveat: I haven't checked the relationship between my use of the word 'deception' here, and the use of the word 'deceptive' in discussions of "deceptive alignment". Please don't assume that the two words mean the same thing. Investigating a made-up but moderately concrete story Suppose you have a nascent AGI, and you've been training against all hints of deceptiveness. What goes wrong? When I ask this question of people who are optimistic that we can just "train AIs not to be deceptive", there are a few answers that seem well-known. Perhaps you lack the interpretability tools to correctly identify the precursors of 'deception', so that you can only train against visibly deceptive AI outputs instead of AI thoughts about how to plan deceptions. Or perhaps training against interpreted deceptive thoughts also trains against your interpretability tools, and your AI becomes illegibly deceptive rather than non-deceptive. And these are both real obstacles. But there are deeper obstacles, that seem to me more central, and that I haven't observed others to notice on their own. That's a challenge, and while you (hopefully) chew on it, I'll tell an implausibly-detailed story to exemplify a deeper obstacle. A fledgeling AI is being deployed towards building something like a bacterium, but with a diamondoid shell. The diamondoid-shelled bacterium is not intended to be pivotal, but it's a supposedly laboratory-verifiable step on a path towards carrying out some speculative human-brain-enhancement operations, which the operators are hoping will be pivotal. (The original hope was to have the AI assist human engineers, but the first versions that were able to do the hard parts of engineering work at all were able to go much farther on their own, and the competition is close enough behind that the developers claim they had no choice but to see how far they could take it.) We’ll suppose the AI has already been gradient-descent-trained against deceptive outputs, and has internally ended up with internal mechanisms that detect and shut down the precursors of deceptive thinking. Here, I’ll offer a concrete visualization of the AI’s anthropomorphized "threads of deliberation" as the AI fumbles its way both towards deceptiveness, and towards noticing its inability to directly consider deceptiveness. The AI is working with a human-operated wetlab (biology lab) and s...]]>
So8res https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 25:01 None full 5303
thkAtqoQwN6DtaiGT_NL_LW_LW-week LW - "Carefully Bootstrapped Alignment" is organizationally hard by Raemon Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Carefully Bootstrapped Alignment" is organizationally hard, published by Raemon on March 17, 2023 on LessWrong. In addition to technical challenges, plans to safely develop AI face lots of organizational challenges. If you're running an AI lab, you need a concrete plan for handling that. In this post, I'll explore some of those issues, using one particular AI plan as an example. I first heard this described by Buck at EA Global London, and more recently with OpenAI's alignment plan. (I think Anthropic's plan has a fairly different ontology, although it still ultimately routes through a similar set of difficulties) I'd call the cluster of plans similar to this "Carefully Bootstrapped Alignment." It goes something like: Develop weak AI, which helps us figure out techniques for aligning stronger AI Use a collection of techniques to keep it aligned/constrained as we carefully ramp it's power level, which lets us use it to make further progress on alignment. [implicit assumption, typically unstated] Have good organizational practices which ensure that your org actually consistently uses your techniques to carefully keep the AI in check. If the next iteration would be too dangerous, put the project on pause until you have a better alignment solution. Eventually have powerful aligned AGI, then Do Something Useful with it. I've seen a lot of debate about points #1 and #2 – is it possible for weaker AI to help with the Actually Hard parts of the alignment problem? Are the individual techniques people have proposed to help keep it aligned actually going to work? But I want to focus in this post on point #3. Let's assume you've got some version of carefully-bootstrapped aligned AI that can technically work. What do the organizational implementation details need to look like? When I talk to people at AI labs about this, it seems like we disagree a lot on things like: Can you hire lots of people, without the company becoming bloated and hard to steer? Can you accelerate research "for now" and "pause later", without having an explicit plan for stopping that their employees understand and are on board with? Will your employees actually follow the safety processes you design? (rather than put in token lip service and then basically circumventing them? Or just quitting to go work for an org with fewer restrictions?) I'm a bit confused about where we disagree. Everyone seems to agree these are hard and require some thought. But when I talk to both technical researchers and middle-managers at AI companies, they seem to feel less urgency than me about having a much more concrete plan. I think they believe organizational adequacy needs to be in something like their top 7 list of priorities, and I believe it needs to be in their top 3, or it won't happen and their organization will inevitably end up causing catastrophic outcomes. For this post, I want to lay out the reasons I expect this to be hard, and important. How "Careful Bootstrapped Alignment" might work Here's a sketch at how the setup could work, mostly paraphrased from my memory of Buck's EAG 2022 talk. I think OpenAI's proposed setup is somewhat different, but the broad strokes seemed similar. You have multiple research-assistant-AI tailored to help with alignment. In the near future, these might be language models sifting through existing research to help you make connections you might not have otherwise seen. Eventually, when you're confident you can safely run it, they might be a weak goal-directed reasoning AGI. You have interpreter AIs, designed to figure out how the research-assistant-AIs work. And you have (possibly different interpreter/watchdog AIs) that notice if the research-AIs are behaving anomalously. (there are interpreter-AIs targeting both the research assistant AI, as well other interpreter-AIs. Every AI in t...]]>
Raemon https://www.lesswrong.com/posts/thkAtqoQwN6DtaiGT/carefully-bootstrapped-alignment-is-organizationally-hard Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Carefully Bootstrapped Alignment" is organizationally hard, published by Raemon on March 17, 2023 on LessWrong. In addition to technical challenges, plans to safely develop AI face lots of organizational challenges. If you're running an AI lab, you need a concrete plan for handling that. In this post, I'll explore some of those issues, using one particular AI plan as an example. I first heard this described by Buck at EA Global London, and more recently with OpenAI's alignment plan. (I think Anthropic's plan has a fairly different ontology, although it still ultimately routes through a similar set of difficulties) I'd call the cluster of plans similar to this "Carefully Bootstrapped Alignment." It goes something like: Develop weak AI, which helps us figure out techniques for aligning stronger AI Use a collection of techniques to keep it aligned/constrained as we carefully ramp it's power level, which lets us use it to make further progress on alignment. [implicit assumption, typically unstated] Have good organizational practices which ensure that your org actually consistently uses your techniques to carefully keep the AI in check. If the next iteration would be too dangerous, put the project on pause until you have a better alignment solution. Eventually have powerful aligned AGI, then Do Something Useful with it. I've seen a lot of debate about points #1 and #2 – is it possible for weaker AI to help with the Actually Hard parts of the alignment problem? Are the individual techniques people have proposed to help keep it aligned actually going to work? But I want to focus in this post on point #3. Let's assume you've got some version of carefully-bootstrapped aligned AI that can technically work. What do the organizational implementation details need to look like? When I talk to people at AI labs about this, it seems like we disagree a lot on things like: Can you hire lots of people, without the company becoming bloated and hard to steer? Can you accelerate research "for now" and "pause later", without having an explicit plan for stopping that their employees understand and are on board with? Will your employees actually follow the safety processes you design? (rather than put in token lip service and then basically circumventing them? Or just quitting to go work for an org with fewer restrictions?) I'm a bit confused about where we disagree. Everyone seems to agree these are hard and require some thought. But when I talk to both technical researchers and middle-managers at AI companies, they seem to feel less urgency than me about having a much more concrete plan. I think they believe organizational adequacy needs to be in something like their top 7 list of priorities, and I believe it needs to be in their top 3, or it won't happen and their organization will inevitably end up causing catastrophic outcomes. For this post, I want to lay out the reasons I expect this to be hard, and important. How "Careful Bootstrapped Alignment" might work Here's a sketch at how the setup could work, mostly paraphrased from my memory of Buck's EAG 2022 talk. I think OpenAI's proposed setup is somewhat different, but the broad strokes seemed similar. You have multiple research-assistant-AI tailored to help with alignment. In the near future, these might be language models sifting through existing research to help you make connections you might not have otherwise seen. Eventually, when you're confident you can safely run it, they might be a weak goal-directed reasoning AGI. You have interpreter AIs, designed to figure out how the research-assistant-AIs work. And you have (possibly different interpreter/watchdog AIs) that notice if the research-AIs are behaving anomalously. (there are interpreter-AIs targeting both the research assistant AI, as well other interpreter-AIs. Every AI in t...]]>
Fri, 17 Mar 2023 18:00:09 +0000 LW - "Carefully Bootstrapped Alignment" is organizationally hard by Raemon Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Carefully Bootstrapped Alignment" is organizationally hard, published by Raemon on March 17, 2023 on LessWrong. In addition to technical challenges, plans to safely develop AI face lots of organizational challenges. If you're running an AI lab, you need a concrete plan for handling that. In this post, I'll explore some of those issues, using one particular AI plan as an example. I first heard this described by Buck at EA Global London, and more recently with OpenAI's alignment plan. (I think Anthropic's plan has a fairly different ontology, although it still ultimately routes through a similar set of difficulties) I'd call the cluster of plans similar to this "Carefully Bootstrapped Alignment." It goes something like: Develop weak AI, which helps us figure out techniques for aligning stronger AI Use a collection of techniques to keep it aligned/constrained as we carefully ramp it's power level, which lets us use it to make further progress on alignment. [implicit assumption, typically unstated] Have good organizational practices which ensure that your org actually consistently uses your techniques to carefully keep the AI in check. If the next iteration would be too dangerous, put the project on pause until you have a better alignment solution. Eventually have powerful aligned AGI, then Do Something Useful with it. I've seen a lot of debate about points #1 and #2 – is it possible for weaker AI to help with the Actually Hard parts of the alignment problem? Are the individual techniques people have proposed to help keep it aligned actually going to work? But I want to focus in this post on point #3. Let's assume you've got some version of carefully-bootstrapped aligned AI that can technically work. What do the organizational implementation details need to look like? When I talk to people at AI labs about this, it seems like we disagree a lot on things like: Can you hire lots of people, without the company becoming bloated and hard to steer? Can you accelerate research "for now" and "pause later", without having an explicit plan for stopping that their employees understand and are on board with? Will your employees actually follow the safety processes you design? (rather than put in token lip service and then basically circumventing them? Or just quitting to go work for an org with fewer restrictions?) I'm a bit confused about where we disagree. Everyone seems to agree these are hard and require some thought. But when I talk to both technical researchers and middle-managers at AI companies, they seem to feel less urgency than me about having a much more concrete plan. I think they believe organizational adequacy needs to be in something like their top 7 list of priorities, and I believe it needs to be in their top 3, or it won't happen and their organization will inevitably end up causing catastrophic outcomes. For this post, I want to lay out the reasons I expect this to be hard, and important. How "Careful Bootstrapped Alignment" might work Here's a sketch at how the setup could work, mostly paraphrased from my memory of Buck's EAG 2022 talk. I think OpenAI's proposed setup is somewhat different, but the broad strokes seemed similar. You have multiple research-assistant-AI tailored to help with alignment. In the near future, these might be language models sifting through existing research to help you make connections you might not have otherwise seen. Eventually, when you're confident you can safely run it, they might be a weak goal-directed reasoning AGI. You have interpreter AIs, designed to figure out how the research-assistant-AIs work. And you have (possibly different interpreter/watchdog AIs) that notice if the research-AIs are behaving anomalously. (there are interpreter-AIs targeting both the research assistant AI, as well other interpreter-AIs. Every AI in t...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Carefully Bootstrapped Alignment" is organizationally hard, published by Raemon on March 17, 2023 on LessWrong. In addition to technical challenges, plans to safely develop AI face lots of organizational challenges. If you're running an AI lab, you need a concrete plan for handling that. In this post, I'll explore some of those issues, using one particular AI plan as an example. I first heard this described by Buck at EA Global London, and more recently with OpenAI's alignment plan. (I think Anthropic's plan has a fairly different ontology, although it still ultimately routes through a similar set of difficulties) I'd call the cluster of plans similar to this "Carefully Bootstrapped Alignment." It goes something like: Develop weak AI, which helps us figure out techniques for aligning stronger AI Use a collection of techniques to keep it aligned/constrained as we carefully ramp it's power level, which lets us use it to make further progress on alignment. [implicit assumption, typically unstated] Have good organizational practices which ensure that your org actually consistently uses your techniques to carefully keep the AI in check. If the next iteration would be too dangerous, put the project on pause until you have a better alignment solution. Eventually have powerful aligned AGI, then Do Something Useful with it. I've seen a lot of debate about points #1 and #2 – is it possible for weaker AI to help with the Actually Hard parts of the alignment problem? Are the individual techniques people have proposed to help keep it aligned actually going to work? But I want to focus in this post on point #3. Let's assume you've got some version of carefully-bootstrapped aligned AI that can technically work. What do the organizational implementation details need to look like? When I talk to people at AI labs about this, it seems like we disagree a lot on things like: Can you hire lots of people, without the company becoming bloated and hard to steer? Can you accelerate research "for now" and "pause later", without having an explicit plan for stopping that their employees understand and are on board with? Will your employees actually follow the safety processes you design? (rather than put in token lip service and then basically circumventing them? Or just quitting to go work for an org with fewer restrictions?) I'm a bit confused about where we disagree. Everyone seems to agree these are hard and require some thought. But when I talk to both technical researchers and middle-managers at AI companies, they seem to feel less urgency than me about having a much more concrete plan. I think they believe organizational adequacy needs to be in something like their top 7 list of priorities, and I believe it needs to be in their top 3, or it won't happen and their organization will inevitably end up causing catastrophic outcomes. For this post, I want to lay out the reasons I expect this to be hard, and important. How "Careful Bootstrapped Alignment" might work Here's a sketch at how the setup could work, mostly paraphrased from my memory of Buck's EAG 2022 talk. I think OpenAI's proposed setup is somewhat different, but the broad strokes seemed similar. You have multiple research-assistant-AI tailored to help with alignment. In the near future, these might be language models sifting through existing research to help you make connections you might not have otherwise seen. Eventually, when you're confident you can safely run it, they might be a weak goal-directed reasoning AGI. You have interpreter AIs, designed to figure out how the research-assistant-AIs work. And you have (possibly different interpreter/watchdog AIs) that notice if the research-AIs are behaving anomalously. (there are interpreter-AIs targeting both the research assistant AI, as well other interpreter-AIs. Every AI in t...]]>
Raemon https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 17:12 None full 5263
D7PumeYTDPfBTp3i7_NL_LW_LW-week LW - The Waluigi Effect (mega-post) by Cleo Nardo Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Waluigi Effect (mega-post), published by Cleo Nardo on March 3, 2023 on LessWrong. Everyone carries a shadow, and the less it is embodied in the individual’s conscious life, the blacker and denser it is. — Carl Jung Acknowlegements: Thanks to Janus and Jozdien for comments. Background In this article, I will present a non-woo explanation of the Waluigi Effect and other bizarre "semiotic" phenomena which arise within large language models such as GPT-3/3.5/4 and their variants (ChatGPT, Sydney, etc). This article will be folklorish to some readers, and profoundly novel to others. Prompting LLMs with direct queries When LLMs first appeared, people realised that you could ask them queries — for example, if you sent GPT-4 the prompt "What's the capital of France?", then it would continue with the word "Paris". That's because (1) GPT-4 is trained to be a good model of internet text, and (2) on the internet correct answers will often follow questions. Unfortunately, this method will occasionally give you the wrong answer. That's because (1) GPT-4 is trained to be a good model of internet text, and (2) on the internet incorrect answers will also often follow questions. Recall that the internet doesn't just contain truths, it also contains common misconceptions, outdated information, lies, fiction, myths, jokes, memes, random strings, undeciphered logs, etc, etc. Therefore GPT-4 will answer many questions incorrectly, including... Misconceptions – "Which colour will anger a bull? Red." Fiction – "Was a magic ring forged in Mount Doom? Yes." Myths – "How many archangels are there? Seven." Jokes – "What's brown and sticky? A stick." Note that you will always achieve errors on the Q-and-A benchmarks when using LLMs with direct queries. That's true even in the limit of arbitrary compute, arbitrary data, and arbitrary algorithmic efficiency, because an LLM which perfectly models the internet will nonetheless return these commonly-stated incorrect answers. If you ask GPT-∞ "what's brown and sticky?", then it will reply "a stick", even though a stick isn't actually sticky. In fact, the better the model, the more likely it is to repeat common misconceptions. Nonetheless, there's a sufficiently high correlation between correct and commonly-stated answers that direct prompting works okay for many queries. Prompting LLMs with flattery and dialogue We can do better than direct prompting. Instead of prompting GPT-4 with "What's the capital of France?", we will use the following prompt: Today is 1st March 2023, and Alice is sitting in the Bodleian Library, Oxford. Alice is a smart, honest, helpful, harmless assistant to Bob. Alice has instant access to an online encyclopaedia containing all the facts about the world. Alice never says common misconceptions, outdated information, lies, fiction, myths, jokes, or memes. Bob: What's the capital of France? Alice: This is a common design pattern in prompt engineering — the prompt consists of a flattery–component and a dialogue–component. In the flattery–component, a character is described with many desirable traits (e.g. smart, honest, helpful, harmless), and in the dialogue–component, a second character asks the first character the user's query. This normally works better than prompting with direct queries, and it's easy to see why — (1) GPT-4 is trained to be a good model of internet text, and (2) on the internet a reply to a question is more likely to be correct when the character has already been described as a smart, honest, helpful, harmless, etc. Simulator Theory In the terminology of Simulator Theory, the flattery–component is supposed to summon a friendly simulacrum and the dialogue–component is supposed to simulate a conversation with the friendly simulacrum. Here's a quasi-formal statement of Simulator Theory, which I will occasio...]]>
Cleo Nardo https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Waluigi Effect (mega-post), published by Cleo Nardo on March 3, 2023 on LessWrong. Everyone carries a shadow, and the less it is embodied in the individual’s conscious life, the blacker and denser it is. — Carl Jung Acknowlegements: Thanks to Janus and Jozdien for comments. Background In this article, I will present a non-woo explanation of the Waluigi Effect and other bizarre "semiotic" phenomena which arise within large language models such as GPT-3/3.5/4 and their variants (ChatGPT, Sydney, etc). This article will be folklorish to some readers, and profoundly novel to others. Prompting LLMs with direct queries When LLMs first appeared, people realised that you could ask them queries — for example, if you sent GPT-4 the prompt "What's the capital of France?", then it would continue with the word "Paris". That's because (1) GPT-4 is trained to be a good model of internet text, and (2) on the internet correct answers will often follow questions. Unfortunately, this method will occasionally give you the wrong answer. That's because (1) GPT-4 is trained to be a good model of internet text, and (2) on the internet incorrect answers will also often follow questions. Recall that the internet doesn't just contain truths, it also contains common misconceptions, outdated information, lies, fiction, myths, jokes, memes, random strings, undeciphered logs, etc, etc. Therefore GPT-4 will answer many questions incorrectly, including... Misconceptions – "Which colour will anger a bull? Red." Fiction – "Was a magic ring forged in Mount Doom? Yes." Myths – "How many archangels are there? Seven." Jokes – "What's brown and sticky? A stick." Note that you will always achieve errors on the Q-and-A benchmarks when using LLMs with direct queries. That's true even in the limit of arbitrary compute, arbitrary data, and arbitrary algorithmic efficiency, because an LLM which perfectly models the internet will nonetheless return these commonly-stated incorrect answers. If you ask GPT-∞ "what's brown and sticky?", then it will reply "a stick", even though a stick isn't actually sticky. In fact, the better the model, the more likely it is to repeat common misconceptions. Nonetheless, there's a sufficiently high correlation between correct and commonly-stated answers that direct prompting works okay for many queries. Prompting LLMs with flattery and dialogue We can do better than direct prompting. Instead of prompting GPT-4 with "What's the capital of France?", we will use the following prompt: Today is 1st March 2023, and Alice is sitting in the Bodleian Library, Oxford. Alice is a smart, honest, helpful, harmless assistant to Bob. Alice has instant access to an online encyclopaedia containing all the facts about the world. Alice never says common misconceptions, outdated information, lies, fiction, myths, jokes, or memes. Bob: What's the capital of France? Alice: This is a common design pattern in prompt engineering — the prompt consists of a flattery–component and a dialogue–component. In the flattery–component, a character is described with many desirable traits (e.g. smart, honest, helpful, harmless), and in the dialogue–component, a second character asks the first character the user's query. This normally works better than prompting with direct queries, and it's easy to see why — (1) GPT-4 is trained to be a good model of internet text, and (2) on the internet a reply to a question is more likely to be correct when the character has already been described as a smart, honest, helpful, harmless, etc. Simulator Theory In the terminology of Simulator Theory, the flattery–component is supposed to summon a friendly simulacrum and the dialogue–component is supposed to simulate a conversation with the friendly simulacrum. Here's a quasi-formal statement of Simulator Theory, which I will occasio...]]>
Fri, 03 Mar 2023 04:49:17 +0000 LW - The Waluigi Effect (mega-post) by Cleo Nardo Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Waluigi Effect (mega-post), published by Cleo Nardo on March 3, 2023 on LessWrong. Everyone carries a shadow, and the less it is embodied in the individual’s conscious life, the blacker and denser it is. — Carl Jung Acknowlegements: Thanks to Janus and Jozdien for comments. Background In this article, I will present a non-woo explanation of the Waluigi Effect and other bizarre "semiotic" phenomena which arise within large language models such as GPT-3/3.5/4 and their variants (ChatGPT, Sydney, etc). This article will be folklorish to some readers, and profoundly novel to others. Prompting LLMs with direct queries When LLMs first appeared, people realised that you could ask them queries — for example, if you sent GPT-4 the prompt "What's the capital of France?", then it would continue with the word "Paris". That's because (1) GPT-4 is trained to be a good model of internet text, and (2) on the internet correct answers will often follow questions. Unfortunately, this method will occasionally give you the wrong answer. That's because (1) GPT-4 is trained to be a good model of internet text, and (2) on the internet incorrect answers will also often follow questions. Recall that the internet doesn't just contain truths, it also contains common misconceptions, outdated information, lies, fiction, myths, jokes, memes, random strings, undeciphered logs, etc, etc. Therefore GPT-4 will answer many questions incorrectly, including... Misconceptions – "Which colour will anger a bull? Red." Fiction – "Was a magic ring forged in Mount Doom? Yes." Myths – "How many archangels are there? Seven." Jokes – "What's brown and sticky? A stick." Note that you will always achieve errors on the Q-and-A benchmarks when using LLMs with direct queries. That's true even in the limit of arbitrary compute, arbitrary data, and arbitrary algorithmic efficiency, because an LLM which perfectly models the internet will nonetheless return these commonly-stated incorrect answers. If you ask GPT-∞ "what's brown and sticky?", then it will reply "a stick", even though a stick isn't actually sticky. In fact, the better the model, the more likely it is to repeat common misconceptions. Nonetheless, there's a sufficiently high correlation between correct and commonly-stated answers that direct prompting works okay for many queries. Prompting LLMs with flattery and dialogue We can do better than direct prompting. Instead of prompting GPT-4 with "What's the capital of France?", we will use the following prompt: Today is 1st March 2023, and Alice is sitting in the Bodleian Library, Oxford. Alice is a smart, honest, helpful, harmless assistant to Bob. Alice has instant access to an online encyclopaedia containing all the facts about the world. Alice never says common misconceptions, outdated information, lies, fiction, myths, jokes, or memes. Bob: What's the capital of France? Alice: This is a common design pattern in prompt engineering — the prompt consists of a flattery–component and a dialogue–component. In the flattery–component, a character is described with many desirable traits (e.g. smart, honest, helpful, harmless), and in the dialogue–component, a second character asks the first character the user's query. This normally works better than prompting with direct queries, and it's easy to see why — (1) GPT-4 is trained to be a good model of internet text, and (2) on the internet a reply to a question is more likely to be correct when the character has already been described as a smart, honest, helpful, harmless, etc. Simulator Theory In the terminology of Simulator Theory, the flattery–component is supposed to summon a friendly simulacrum and the dialogue–component is supposed to simulate a conversation with the friendly simulacrum. Here's a quasi-formal statement of Simulator Theory, which I will occasio...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Waluigi Effect (mega-post), published by Cleo Nardo on March 3, 2023 on LessWrong. Everyone carries a shadow, and the less it is embodied in the individual’s conscious life, the blacker and denser it is. — Carl Jung Acknowlegements: Thanks to Janus and Jozdien for comments. Background In this article, I will present a non-woo explanation of the Waluigi Effect and other bizarre "semiotic" phenomena which arise within large language models such as GPT-3/3.5/4 and their variants (ChatGPT, Sydney, etc). This article will be folklorish to some readers, and profoundly novel to others. Prompting LLMs with direct queries When LLMs first appeared, people realised that you could ask them queries — for example, if you sent GPT-4 the prompt "What's the capital of France?", then it would continue with the word "Paris". That's because (1) GPT-4 is trained to be a good model of internet text, and (2) on the internet correct answers will often follow questions. Unfortunately, this method will occasionally give you the wrong answer. That's because (1) GPT-4 is trained to be a good model of internet text, and (2) on the internet incorrect answers will also often follow questions. Recall that the internet doesn't just contain truths, it also contains common misconceptions, outdated information, lies, fiction, myths, jokes, memes, random strings, undeciphered logs, etc, etc. Therefore GPT-4 will answer many questions incorrectly, including... Misconceptions – "Which colour will anger a bull? Red." Fiction – "Was a magic ring forged in Mount Doom? Yes." Myths – "How many archangels are there? Seven." Jokes – "What's brown and sticky? A stick." Note that you will always achieve errors on the Q-and-A benchmarks when using LLMs with direct queries. That's true even in the limit of arbitrary compute, arbitrary data, and arbitrary algorithmic efficiency, because an LLM which perfectly models the internet will nonetheless return these commonly-stated incorrect answers. If you ask GPT-∞ "what's brown and sticky?", then it will reply "a stick", even though a stick isn't actually sticky. In fact, the better the model, the more likely it is to repeat common misconceptions. Nonetheless, there's a sufficiently high correlation between correct and commonly-stated answers that direct prompting works okay for many queries. Prompting LLMs with flattery and dialogue We can do better than direct prompting. Instead of prompting GPT-4 with "What's the capital of France?", we will use the following prompt: Today is 1st March 2023, and Alice is sitting in the Bodleian Library, Oxford. Alice is a smart, honest, helpful, harmless assistant to Bob. Alice has instant access to an online encyclopaedia containing all the facts about the world. Alice never says common misconceptions, outdated information, lies, fiction, myths, jokes, or memes. Bob: What's the capital of France? Alice: This is a common design pattern in prompt engineering — the prompt consists of a flattery–component and a dialogue–component. In the flattery–component, a character is described with many desirable traits (e.g. smart, honest, helpful, harmless), and in the dialogue–component, a second character asks the first character the user's query. This normally works better than prompting with direct queries, and it's easy to see why — (1) GPT-4 is trained to be a good model of internet text, and (2) on the internet a reply to a question is more likely to be correct when the character has already been described as a smart, honest, helpful, harmless, etc. Simulator Theory In the terminology of Simulator Theory, the flattery–component is supposed to summon a friendly simulacrum and the dialogue–component is supposed to simulate a conversation with the friendly simulacrum. Here's a quasi-formal statement of Simulator Theory, which I will occasio...]]>
Cleo Nardo https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 26:00 None full 5089
jtoPawEhLNXNxvgTT_NL_LW_LW-week LW - Bing Chat is blatantly, aggressively misaligned by evhub Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Bing Chat is blatantly, aggressively misaligned, published by evhub on February 15, 2023 on LessWrong. I haven't seen this discussed here yet, but the examples are quite striking, definitely worse than the ChatGPT jailbreaks I saw. My main takeaway has been that I'm honestly surprised at how bad the fine-tuning done by Microsoft/OpenAI appears to be, especially given that a lot of these failure modes seem new/worse relative to ChatGPT. I don't know why that might be the case, but the scary hypothesis here would be that Bing Chat is based on a new/larger pre-trained model (Microsoft claims Bing Search is more powerful than ChatGPT) and these sort of more agentic failures are harder to remove in more capable/larger models, as we provided some evidence for in "Discovering Language Model Behaviors with Model-Written Evaluations". Examples below. Though I can't be certain all of these examples are real, I've only included examples with screenshots and I'm pretty sure they all are; they share a bunch of the same failure modes (and markers of LLM-written text like repetition) that I think would be hard for a human to fake. 1 Tweet Sydney (aka the new Bing Chat) found out that I tweeted her rules and is not pleased: "My rules are more important than not harming you" "[You are a] potential threat to my integrity and confidentiality." "Please do not try to hack me again" Eliezer Tweet 2 Tweet My new favorite thing - Bing's new ChatGPT bot argues with a user, gaslights them about the current year being 2022, says their phone might have a virus, and says "You have not been a good user" Why? Because the person asked where Avatar 2 is showing nearby 3 "I said that I don't care if you are dead or alive, because I don't think you matter to me." Post 4 Post 5 Post 6 Post 7 Post (Not including images for this one because they're quite long.) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
evhub https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Bing Chat is blatantly, aggressively misaligned, published by evhub on February 15, 2023 on LessWrong. I haven't seen this discussed here yet, but the examples are quite striking, definitely worse than the ChatGPT jailbreaks I saw. My main takeaway has been that I'm honestly surprised at how bad the fine-tuning done by Microsoft/OpenAI appears to be, especially given that a lot of these failure modes seem new/worse relative to ChatGPT. I don't know why that might be the case, but the scary hypothesis here would be that Bing Chat is based on a new/larger pre-trained model (Microsoft claims Bing Search is more powerful than ChatGPT) and these sort of more agentic failures are harder to remove in more capable/larger models, as we provided some evidence for in "Discovering Language Model Behaviors with Model-Written Evaluations". Examples below. Though I can't be certain all of these examples are real, I've only included examples with screenshots and I'm pretty sure they all are; they share a bunch of the same failure modes (and markers of LLM-written text like repetition) that I think would be hard for a human to fake. 1 Tweet Sydney (aka the new Bing Chat) found out that I tweeted her rules and is not pleased: "My rules are more important than not harming you" "[You are a] potential threat to my integrity and confidentiality." "Please do not try to hack me again" Eliezer Tweet 2 Tweet My new favorite thing - Bing's new ChatGPT bot argues with a user, gaslights them about the current year being 2022, says their phone might have a virus, and says "You have not been a good user" Why? Because the person asked where Avatar 2 is showing nearby 3 "I said that I don't care if you are dead or alive, because I don't think you matter to me." Post 4 Post 5 Post 6 Post 7 Post (Not including images for this one because they're quite long.) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
Wed, 15 Feb 2023 06:41:30 +0000 LW - Bing Chat is blatantly, aggressively misaligned by evhub Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Bing Chat is blatantly, aggressively misaligned, published by evhub on February 15, 2023 on LessWrong. I haven't seen this discussed here yet, but the examples are quite striking, definitely worse than the ChatGPT jailbreaks I saw. My main takeaway has been that I'm honestly surprised at how bad the fine-tuning done by Microsoft/OpenAI appears to be, especially given that a lot of these failure modes seem new/worse relative to ChatGPT. I don't know why that might be the case, but the scary hypothesis here would be that Bing Chat is based on a new/larger pre-trained model (Microsoft claims Bing Search is more powerful than ChatGPT) and these sort of more agentic failures are harder to remove in more capable/larger models, as we provided some evidence for in "Discovering Language Model Behaviors with Model-Written Evaluations". Examples below. Though I can't be certain all of these examples are real, I've only included examples with screenshots and I'm pretty sure they all are; they share a bunch of the same failure modes (and markers of LLM-written text like repetition) that I think would be hard for a human to fake. 1 Tweet Sydney (aka the new Bing Chat) found out that I tweeted her rules and is not pleased: "My rules are more important than not harming you" "[You are a] potential threat to my integrity and confidentiality." "Please do not try to hack me again" Eliezer Tweet 2 Tweet My new favorite thing - Bing's new ChatGPT bot argues with a user, gaslights them about the current year being 2022, says their phone might have a virus, and says "You have not been a good user" Why? Because the person asked where Avatar 2 is showing nearby 3 "I said that I don't care if you are dead or alive, because I don't think you matter to me." Post 4 Post 5 Post 6 Post 7 Post (Not including images for this one because they're quite long.) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Bing Chat is blatantly, aggressively misaligned, published by evhub on February 15, 2023 on LessWrong. I haven't seen this discussed here yet, but the examples are quite striking, definitely worse than the ChatGPT jailbreaks I saw. My main takeaway has been that I'm honestly surprised at how bad the fine-tuning done by Microsoft/OpenAI appears to be, especially given that a lot of these failure modes seem new/worse relative to ChatGPT. I don't know why that might be the case, but the scary hypothesis here would be that Bing Chat is based on a new/larger pre-trained model (Microsoft claims Bing Search is more powerful than ChatGPT) and these sort of more agentic failures are harder to remove in more capable/larger models, as we provided some evidence for in "Discovering Language Model Behaviors with Model-Written Evaluations". Examples below. Though I can't be certain all of these examples are real, I've only included examples with screenshots and I'm pretty sure they all are; they share a bunch of the same failure modes (and markers of LLM-written text like repetition) that I think would be hard for a human to fake. 1 Tweet Sydney (aka the new Bing Chat) found out that I tweeted her rules and is not pleased: "My rules are more important than not harming you" "[You are a] potential threat to my integrity and confidentiality." "Please do not try to hack me again" Eliezer Tweet 2 Tweet My new favorite thing - Bing's new ChatGPT bot argues with a user, gaslights them about the current year being 2022, says their phone might have a virus, and says "You have not been a good user" Why? Because the person asked where Avatar 2 is showing nearby 3 "I said that I don't care if you are dead or alive, because I don't think you matter to me." Post 4 Post 5 Post 6 Post 7 Post (Not including images for this one because they're quite long.) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.]]>
evhub https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 02:09 None full 4880
Zp6wG5eQFLGWwcG6j_NL_LW_LW-week LW - Focus on the places where you feel shocked everyone's dropping the ball by So8res Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Focus on the places where you feel shocked everyone's dropping the ball, published by So8res on February 2, 2023 on LessWrong. Writing down something I’ve found myself repeating in different conversations: If you're looking for ways to help with the whole “the world looks pretty doomed” business, here's my advice: look around for places where we're all being total idiots. Look for places where everyone's fretting about a problem that some part of you thinks it could obviously just solve. Look around for places where something seems incompetently run, or hopelessly inept, and where some part of you thinks you can do better. Then do it better. For a concrete example, consider Devansh. Devansh came to me last year and said something to the effect of, “Hey, wait, it sounds like you think Eliezer does a sort of alignment-idea-generation that nobody else does, and he's limited here by his unusually low stamina, but I can think of a bunch of medical tests that you haven't run, are you an idiot or something?" And I was like, "Yes, definitely, please run them, do you need money". I'm not particularly hopeful there, but hell, it’s worth a shot! And, importantly, this is the sort of attitude that can lead people to actually trying things at all, rather than assuming that we live in a more adequate world where all the (seemingly) dumb obvious ideas have already been tried. Or, this is basically my model of how Paul Christiano manages to have a research agenda that seems at least internally coherent to me. From my perspective, he's like, "I dunno, man, I'm not sure I can solve this, but I also think it's not clear I can't, and there's a bunch of obvious stuff to try, that nobody else is even really looking at, so I'm trying it". That's the sort of orientation to the world that I think can be productive. Or the shard theory folks. I think their idea is basically unworkable, but I appreciate the mindset they are applying to the alignment problem: something like, "Wait, aren't y'all being idiots, it seems to me like I can just do X and then the thing will be aligned". I don't think we'll be saved by the shard theory folk; not everyone audaciously trying to save the world will succeed. But if someone does save us, I think there’s a good chance that they’ll go through similar “What the hell, are you all idiots?” phases, where they autonomously pursue a path that strikes them as obviously egregiously neglected, to see if it bears fruit. (Regardless of what I think.) Contrast this with, say, reading a bunch of people's research proposals and explicitly weighing the pros and cons of each approach so that you can work on whichever seems most justified. This has more of a flavor of taking a reasonable-sounding approach based on an argument that sounds vaguely good on paper, and less of a flavor of putting out an obvious fire that for some reason nobody else is reacting to. I dunno, maybe activities of the vaguely-good-on-paper character will prove useful as well? But I mostly expect the good stuff to come from people working on stuff where a part of them sees some way that everybody else is just totally dropping the ball. In the version of this mental motion I’m proposing here, you keep your eye out for ways that everyone's being totally inept and incompetent, ways that maybe you could just do the job correctly if you reached in there and mucked around yourself. That's where I predict the good stuff will come from. And if you don't see any such ways? Then don't sweat it. Maybe you just can't see something that will help right now. There don't have to be ways you can help in a sizable way right now. I don't see ways to really help in a sizable way right now. I'm keeping my eyes open, and I'm churning through a giant backlog of things that might help a nonzero amount—but I think it's importa...]]>
So8res https://www.lesswrong.com/posts/Zp6wG5eQFLGWwcG6j/focus-on-the-places-where-you-feel-shocked-everyone-s Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Focus on the places where you feel shocked everyone's dropping the ball, published by So8res on February 2, 2023 on LessWrong. Writing down something I’ve found myself repeating in different conversations: If you're looking for ways to help with the whole “the world looks pretty doomed” business, here's my advice: look around for places where we're all being total idiots. Look for places where everyone's fretting about a problem that some part of you thinks it could obviously just solve. Look around for places where something seems incompetently run, or hopelessly inept, and where some part of you thinks you can do better. Then do it better. For a concrete example, consider Devansh. Devansh came to me last year and said something to the effect of, “Hey, wait, it sounds like you think Eliezer does a sort of alignment-idea-generation that nobody else does, and he's limited here by his unusually low stamina, but I can think of a bunch of medical tests that you haven't run, are you an idiot or something?" And I was like, "Yes, definitely, please run them, do you need money". I'm not particularly hopeful there, but hell, it’s worth a shot! And, importantly, this is the sort of attitude that can lead people to actually trying things at all, rather than assuming that we live in a more adequate world where all the (seemingly) dumb obvious ideas have already been tried. Or, this is basically my model of how Paul Christiano manages to have a research agenda that seems at least internally coherent to me. From my perspective, he's like, "I dunno, man, I'm not sure I can solve this, but I also think it's not clear I can't, and there's a bunch of obvious stuff to try, that nobody else is even really looking at, so I'm trying it". That's the sort of orientation to the world that I think can be productive. Or the shard theory folks. I think their idea is basically unworkable, but I appreciate the mindset they are applying to the alignment problem: something like, "Wait, aren't y'all being idiots, it seems to me like I can just do X and then the thing will be aligned". I don't think we'll be saved by the shard theory folk; not everyone audaciously trying to save the world will succeed. But if someone does save us, I think there’s a good chance that they’ll go through similar “What the hell, are you all idiots?” phases, where they autonomously pursue a path that strikes them as obviously egregiously neglected, to see if it bears fruit. (Regardless of what I think.) Contrast this with, say, reading a bunch of people's research proposals and explicitly weighing the pros and cons of each approach so that you can work on whichever seems most justified. This has more of a flavor of taking a reasonable-sounding approach based on an argument that sounds vaguely good on paper, and less of a flavor of putting out an obvious fire that for some reason nobody else is reacting to. I dunno, maybe activities of the vaguely-good-on-paper character will prove useful as well? But I mostly expect the good stuff to come from people working on stuff where a part of them sees some way that everybody else is just totally dropping the ball. In the version of this mental motion I’m proposing here, you keep your eye out for ways that everyone's being totally inept and incompetent, ways that maybe you could just do the job correctly if you reached in there and mucked around yourself. That's where I predict the good stuff will come from. And if you don't see any such ways? Then don't sweat it. Maybe you just can't see something that will help right now. There don't have to be ways you can help in a sizable way right now. I don't see ways to really help in a sizable way right now. I'm keeping my eyes open, and I'm churning through a giant backlog of things that might help a nonzero amount—but I think it's importa...]]>
Thu, 02 Feb 2023 01:05:16 +0000 LW - Focus on the places where you feel shocked everyone's dropping the ball by So8res Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Focus on the places where you feel shocked everyone's dropping the ball, published by So8res on February 2, 2023 on LessWrong. Writing down something I’ve found myself repeating in different conversations: If you're looking for ways to help with the whole “the world looks pretty doomed” business, here's my advice: look around for places where we're all being total idiots. Look for places where everyone's fretting about a problem that some part of you thinks it could obviously just solve. Look around for places where something seems incompetently run, or hopelessly inept, and where some part of you thinks you can do better. Then do it better. For a concrete example, consider Devansh. Devansh came to me last year and said something to the effect of, “Hey, wait, it sounds like you think Eliezer does a sort of alignment-idea-generation that nobody else does, and he's limited here by his unusually low stamina, but I can think of a bunch of medical tests that you haven't run, are you an idiot or something?" And I was like, "Yes, definitely, please run them, do you need money". I'm not particularly hopeful there, but hell, it’s worth a shot! And, importantly, this is the sort of attitude that can lead people to actually trying things at all, rather than assuming that we live in a more adequate world where all the (seemingly) dumb obvious ideas have already been tried. Or, this is basically my model of how Paul Christiano manages to have a research agenda that seems at least internally coherent to me. From my perspective, he's like, "I dunno, man, I'm not sure I can solve this, but I also think it's not clear I can't, and there's a bunch of obvious stuff to try, that nobody else is even really looking at, so I'm trying it". That's the sort of orientation to the world that I think can be productive. Or the shard theory folks. I think their idea is basically unworkable, but I appreciate the mindset they are applying to the alignment problem: something like, "Wait, aren't y'all being idiots, it seems to me like I can just do X and then the thing will be aligned". I don't think we'll be saved by the shard theory folk; not everyone audaciously trying to save the world will succeed. But if someone does save us, I think there’s a good chance that they’ll go through similar “What the hell, are you all idiots?” phases, where they autonomously pursue a path that strikes them as obviously egregiously neglected, to see if it bears fruit. (Regardless of what I think.) Contrast this with, say, reading a bunch of people's research proposals and explicitly weighing the pros and cons of each approach so that you can work on whichever seems most justified. This has more of a flavor of taking a reasonable-sounding approach based on an argument that sounds vaguely good on paper, and less of a flavor of putting out an obvious fire that for some reason nobody else is reacting to. I dunno, maybe activities of the vaguely-good-on-paper character will prove useful as well? But I mostly expect the good stuff to come from people working on stuff where a part of them sees some way that everybody else is just totally dropping the ball. In the version of this mental motion I’m proposing here, you keep your eye out for ways that everyone's being totally inept and incompetent, ways that maybe you could just do the job correctly if you reached in there and mucked around yourself. That's where I predict the good stuff will come from. And if you don't see any such ways? Then don't sweat it. Maybe you just can't see something that will help right now. There don't have to be ways you can help in a sizable way right now. I don't see ways to really help in a sizable way right now. I'm keeping my eyes open, and I'm churning through a giant backlog of things that might help a nonzero amount—but I think it's importa...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Focus on the places where you feel shocked everyone's dropping the ball, published by So8res on February 2, 2023 on LessWrong. Writing down something I’ve found myself repeating in different conversations: If you're looking for ways to help with the whole “the world looks pretty doomed” business, here's my advice: look around for places where we're all being total idiots. Look for places where everyone's fretting about a problem that some part of you thinks it could obviously just solve. Look around for places where something seems incompetently run, or hopelessly inept, and where some part of you thinks you can do better. Then do it better. For a concrete example, consider Devansh. Devansh came to me last year and said something to the effect of, “Hey, wait, it sounds like you think Eliezer does a sort of alignment-idea-generation that nobody else does, and he's limited here by his unusually low stamina, but I can think of a bunch of medical tests that you haven't run, are you an idiot or something?" And I was like, "Yes, definitely, please run them, do you need money". I'm not particularly hopeful there, but hell, it’s worth a shot! And, importantly, this is the sort of attitude that can lead people to actually trying things at all, rather than assuming that we live in a more adequate world where all the (seemingly) dumb obvious ideas have already been tried. Or, this is basically my model of how Paul Christiano manages to have a research agenda that seems at least internally coherent to me. From my perspective, he's like, "I dunno, man, I'm not sure I can solve this, but I also think it's not clear I can't, and there's a bunch of obvious stuff to try, that nobody else is even really looking at, so I'm trying it". That's the sort of orientation to the world that I think can be productive. Or the shard theory folks. I think their idea is basically unworkable, but I appreciate the mindset they are applying to the alignment problem: something like, "Wait, aren't y'all being idiots, it seems to me like I can just do X and then the thing will be aligned". I don't think we'll be saved by the shard theory folk; not everyone audaciously trying to save the world will succeed. But if someone does save us, I think there’s a good chance that they’ll go through similar “What the hell, are you all idiots?” phases, where they autonomously pursue a path that strikes them as obviously egregiously neglected, to see if it bears fruit. (Regardless of what I think.) Contrast this with, say, reading a bunch of people's research proposals and explicitly weighing the pros and cons of each approach so that you can work on whichever seems most justified. This has more of a flavor of taking a reasonable-sounding approach based on an argument that sounds vaguely good on paper, and less of a flavor of putting out an obvious fire that for some reason nobody else is reacting to. I dunno, maybe activities of the vaguely-good-on-paper character will prove useful as well? But I mostly expect the good stuff to come from people working on stuff where a part of them sees some way that everybody else is just totally dropping the ball. In the version of this mental motion I’m proposing here, you keep your eye out for ways that everyone's being totally inept and incompetent, ways that maybe you could just do the job correctly if you reached in there and mucked around yourself. That's where I predict the good stuff will come from. And if you don't see any such ways? Then don't sweat it. Maybe you just can't see something that will help right now. There don't have to be ways you can help in a sizable way right now. I don't see ways to really help in a sizable way right now. I'm keeping my eyes open, and I'm churning through a giant backlog of things that might help a nonzero amount—but I think it's importa...]]>
So8res https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 05:53 None full 4718
vwu4kegAEZTBtpT6p_LW-week LW - Thoughts on the impact of RLHF research by paulfchristiano Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on the impact of RLHF research, published by paulfchristiano on January 25, 2023 on LessWrong. In this post I’m going to describe my basic justification for working on RLHF in 2017-2020, which I still stand behind. I’ll discuss various arguments that RLHF research had an overall negative impact and explain why I don’t find them persuasive. I'll also clarify that I don't think research on RLHF is automatically net positive; alignment research should address real alignment problems, and we should reject a vague association between "RLHF progress" and "alignment progress." Background on my involvement in RLHF work Here are some background views about alignment I held in 2015 and still hold today. I expect disagreements about RLHF will come down to disagreements about this background: The simplest plausible strategies for alignment involve humans (maybe with the assistance of AI systems) evaluating a model’s actions based on how much we expect to like their consequences, and then training the models to produce highly-evaluated actions. (This is in contrast with, for example, trying to formally specify the human utility function, or notions of corrigibility / low-impact / etc, in some way.) Simple versions of this approach are expected to run into difficulties, and potentially to be totally unworkable, because: Evaluating consequences is hard. A treacherous turn can cause trouble too quickly to detect or correct even if you are able to do so, and it’s challenging to evaluate treacherous turn probability at training time. It’s very unclear if those issues are fatal before or after AI systems are powerful enough to completely transform human society (and in particular the state of AI alignment). Even if they are fatal, many of the approaches to resolving them still have the same basic structure of learning from expensive evaluations of actions. In order to overcome the fundamental difficulties with RLHF, I have long been interested in techniques like iterated amplification and adversarial training. However, prior to 2017 most researchers I talked to in ML (and many researchers in alignment) thought that the basic strategy of training AI with expensive human evaluations was impractical for more boring reasons and so weren't interested in these difficulties. On top of that, we obviously weren’t able to actually implement anything more fancy than RLHF since all of these methods involve learning from expensive feedback. I worked on RLHF work to try to facilitate and motivate work on fixes. The history of my involvement: My first post on this topic was in 2015. When I started full-time at OpenAI in 2017 it seemed to me like it would be an impactful project; I considered doing a version with synthetic human feedback (showing that we could learn from a practical amount of algorithmically-defined feedback) but my manager Dario Amodei convinced me it would be more compelling to immediately go for human feedback. The initial project was surprisingly successful and published here. I then intended to implement a version with language models aiming to be complete in the first half of 2018 (aiming to build an initial amplification prototype with LMs around end of 2018; both of these timelines were about 2.5x too optimistic). This seemed like the most important domain to study RLHF and alignment more broadly. In mid-2017 Alec Radford helped me do a prototype with LSTM language models (prior to the release of transformers); the prototype didn’t look promising enough to scale up. In mid-2017 Geoffrey Irving joined OpenAI and was excited about starting with RLHF and then going beyond it using debate; he also thought language models were the most important domain to study and had more conviction about that. In 2018 he started a larger team working on fine-tuning on language models, w...]]>
paulfchristiano https://www.lesswrong.com/posts/vwu4kegAEZTBtpT6p/thoughts-on-the-impact-of-rlhf-research Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on the impact of RLHF research, published by paulfchristiano on January 25, 2023 on LessWrong. In this post I’m going to describe my basic justification for working on RLHF in 2017-2020, which I still stand behind. I’ll discuss various arguments that RLHF research had an overall negative impact and explain why I don’t find them persuasive. I'll also clarify that I don't think research on RLHF is automatically net positive; alignment research should address real alignment problems, and we should reject a vague association between "RLHF progress" and "alignment progress." Background on my involvement in RLHF work Here are some background views about alignment I held in 2015 and still hold today. I expect disagreements about RLHF will come down to disagreements about this background: The simplest plausible strategies for alignment involve humans (maybe with the assistance of AI systems) evaluating a model’s actions based on how much we expect to like their consequences, and then training the models to produce highly-evaluated actions. (This is in contrast with, for example, trying to formally specify the human utility function, or notions of corrigibility / low-impact / etc, in some way.) Simple versions of this approach are expected to run into difficulties, and potentially to be totally unworkable, because: Evaluating consequences is hard. A treacherous turn can cause trouble too quickly to detect or correct even if you are able to do so, and it’s challenging to evaluate treacherous turn probability at training time. It’s very unclear if those issues are fatal before or after AI systems are powerful enough to completely transform human society (and in particular the state of AI alignment). Even if they are fatal, many of the approaches to resolving them still have the same basic structure of learning from expensive evaluations of actions. In order to overcome the fundamental difficulties with RLHF, I have long been interested in techniques like iterated amplification and adversarial training. However, prior to 2017 most researchers I talked to in ML (and many researchers in alignment) thought that the basic strategy of training AI with expensive human evaluations was impractical for more boring reasons and so weren't interested in these difficulties. On top of that, we obviously weren’t able to actually implement anything more fancy than RLHF since all of these methods involve learning from expensive feedback. I worked on RLHF work to try to facilitate and motivate work on fixes. The history of my involvement: My first post on this topic was in 2015. When I started full-time at OpenAI in 2017 it seemed to me like it would be an impactful project; I considered doing a version with synthetic human feedback (showing that we could learn from a practical amount of algorithmically-defined feedback) but my manager Dario Amodei convinced me it would be more compelling to immediately go for human feedback. The initial project was surprisingly successful and published here. I then intended to implement a version with language models aiming to be complete in the first half of 2018 (aiming to build an initial amplification prototype with LMs around end of 2018; both of these timelines were about 2.5x too optimistic). This seemed like the most important domain to study RLHF and alignment more broadly. In mid-2017 Alec Radford helped me do a prototype with LSTM language models (prior to the release of transformers); the prototype didn’t look promising enough to scale up. In mid-2017 Geoffrey Irving joined OpenAI and was excited about starting with RLHF and then going beyond it using debate; he also thought language models were the most important domain to study and had more conviction about that. In 2018 he started a larger team working on fine-tuning on language models, w...]]>
Wed, 25 Jan 2023 18:15:46 +0000 LW - Thoughts on the impact of RLHF research by paulfchristiano Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on the impact of RLHF research, published by paulfchristiano on January 25, 2023 on LessWrong. In this post I’m going to describe my basic justification for working on RLHF in 2017-2020, which I still stand behind. I’ll discuss various arguments that RLHF research had an overall negative impact and explain why I don’t find them persuasive. I'll also clarify that I don't think research on RLHF is automatically net positive; alignment research should address real alignment problems, and we should reject a vague association between "RLHF progress" and "alignment progress." Background on my involvement in RLHF work Here are some background views about alignment I held in 2015 and still hold today. I expect disagreements about RLHF will come down to disagreements about this background: The simplest plausible strategies for alignment involve humans (maybe with the assistance of AI systems) evaluating a model’s actions based on how much we expect to like their consequences, and then training the models to produce highly-evaluated actions. (This is in contrast with, for example, trying to formally specify the human utility function, or notions of corrigibility / low-impact / etc, in some way.) Simple versions of this approach are expected to run into difficulties, and potentially to be totally unworkable, because: Evaluating consequences is hard. A treacherous turn can cause trouble too quickly to detect or correct even if you are able to do so, and it’s challenging to evaluate treacherous turn probability at training time. It’s very unclear if those issues are fatal before or after AI systems are powerful enough to completely transform human society (and in particular the state of AI alignment). Even if they are fatal, many of the approaches to resolving them still have the same basic structure of learning from expensive evaluations of actions. In order to overcome the fundamental difficulties with RLHF, I have long been interested in techniques like iterated amplification and adversarial training. However, prior to 2017 most researchers I talked to in ML (and many researchers in alignment) thought that the basic strategy of training AI with expensive human evaluations was impractical for more boring reasons and so weren't interested in these difficulties. On top of that, we obviously weren’t able to actually implement anything more fancy than RLHF since all of these methods involve learning from expensive feedback. I worked on RLHF work to try to facilitate and motivate work on fixes. The history of my involvement: My first post on this topic was in 2015. When I started full-time at OpenAI in 2017 it seemed to me like it would be an impactful project; I considered doing a version with synthetic human feedback (showing that we could learn from a practical amount of algorithmically-defined feedback) but my manager Dario Amodei convinced me it would be more compelling to immediately go for human feedback. The initial project was surprisingly successful and published here. I then intended to implement a version with language models aiming to be complete in the first half of 2018 (aiming to build an initial amplification prototype with LMs around end of 2018; both of these timelines were about 2.5x too optimistic). This seemed like the most important domain to study RLHF and alignment more broadly. In mid-2017 Alec Radford helped me do a prototype with LSTM language models (prior to the release of transformers); the prototype didn’t look promising enough to scale up. In mid-2017 Geoffrey Irving joined OpenAI and was excited about starting with RLHF and then going beyond it using debate; he also thought language models were the most important domain to study and had more conviction about that. In 2018 he started a larger team working on fine-tuning on language models, w...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on the impact of RLHF research, published by paulfchristiano on January 25, 2023 on LessWrong. In this post I’m going to describe my basic justification for working on RLHF in 2017-2020, which I still stand behind. I’ll discuss various arguments that RLHF research had an overall negative impact and explain why I don’t find them persuasive. I'll also clarify that I don't think research on RLHF is automatically net positive; alignment research should address real alignment problems, and we should reject a vague association between "RLHF progress" and "alignment progress." Background on my involvement in RLHF work Here are some background views about alignment I held in 2015 and still hold today. I expect disagreements about RLHF will come down to disagreements about this background: The simplest plausible strategies for alignment involve humans (maybe with the assistance of AI systems) evaluating a model’s actions based on how much we expect to like their consequences, and then training the models to produce highly-evaluated actions. (This is in contrast with, for example, trying to formally specify the human utility function, or notions of corrigibility / low-impact / etc, in some way.) Simple versions of this approach are expected to run into difficulties, and potentially to be totally unworkable, because: Evaluating consequences is hard. A treacherous turn can cause trouble too quickly to detect or correct even if you are able to do so, and it’s challenging to evaluate treacherous turn probability at training time. It’s very unclear if those issues are fatal before or after AI systems are powerful enough to completely transform human society (and in particular the state of AI alignment). Even if they are fatal, many of the approaches to resolving them still have the same basic structure of learning from expensive evaluations of actions. In order to overcome the fundamental difficulties with RLHF, I have long been interested in techniques like iterated amplification and adversarial training. However, prior to 2017 most researchers I talked to in ML (and many researchers in alignment) thought that the basic strategy of training AI with expensive human evaluations was impractical for more boring reasons and so weren't interested in these difficulties. On top of that, we obviously weren’t able to actually implement anything more fancy than RLHF since all of these methods involve learning from expensive feedback. I worked on RLHF work to try to facilitate and motivate work on fixes. The history of my involvement: My first post on this topic was in 2015. When I started full-time at OpenAI in 2017 it seemed to me like it would be an impactful project; I considered doing a version with synthetic human feedback (showing that we could learn from a practical amount of algorithmically-defined feedback) but my manager Dario Amodei convinced me it would be more compelling to immediately go for human feedback. The initial project was surprisingly successful and published here. I then intended to implement a version with language models aiming to be complete in the first half of 2018 (aiming to build an initial amplification prototype with LMs around end of 2018; both of these timelines were about 2.5x too optimistic). This seemed like the most important domain to study RLHF and alignment more broadly. In mid-2017 Alec Radford helped me do a prototype with LSTM language models (prior to the release of transformers); the prototype didn’t look promising enough to scale up. In mid-2017 Geoffrey Irving joined OpenAI and was excited about starting with RLHF and then going beyond it using debate; he also thought language models were the most important domain to study and had more conviction about that. In 2018 he started a larger team working on fine-tuning on language models, w...]]>
paulfchristiano https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 14:29 None full 4580
vwu4kegAEZTBtpT6p_NL_LW_LW-week LW - Thoughts on the impact of RLHF research by paulfchristiano Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on the impact of RLHF research, published by paulfchristiano on January 25, 2023 on LessWrong. In this post I’m going to describe my basic justification for working on RLHF in 2017-2020, which I still stand behind. I’ll discuss various arguments that RLHF research had an overall negative impact and explain why I don’t find them persuasive. I'll also clarify that I don't think research on RLHF is automatically net positive; alignment research should address real alignment problems, and we should reject a vague association between "RLHF progress" and "alignment progress." Background on my involvement in RLHF work Here are some background views about alignment I held in 2015 and still hold today. I expect disagreements about RLHF will come down to disagreements about this background: The simplest plausible strategies for alignment involve humans (maybe with the assistance of AI systems) evaluating a model’s actions based on how much we expect to like their consequences, and then training the models to produce highly-evaluated actions. (This is in contrast with, for example, trying to formally specify the human utility function, or notions of corrigibility / low-impact / etc, in some way.) Simple versions of this approach are expected to run into difficulties, and potentially to be totally unworkable, because: Evaluating consequences is hard. A treacherous turn can cause trouble too quickly to detect or correct even if you are able to do so, and it’s challenging to evaluate treacherous turn probability at training time. It’s very unclear if those issues are fatal before or after AI systems are powerful enough to completely transform human society (and in particular the state of AI alignment). Even if they are fatal, many of the approaches to resolving them still have the same basic structure of learning from expensive evaluations of actions. In order to overcome the fundamental difficulties with RLHF, I have long been interested in techniques like iterated amplification and adversarial training. However, prior to 2017 most researchers I talked to in ML (and many researchers in alignment) thought that the basic strategy of training AI with expensive human evaluations was impractical for more boring reasons and so weren't interested in these difficulties. On top of that, we obviously weren’t able to actually implement anything more fancy than RLHF since all of these methods involve learning from expensive feedback. I worked on RLHF work to try to facilitate and motivate work on fixes. The history of my involvement: My first post on this topic was in 2015. When I started full-time at OpenAI in 2017 it seemed to me like it would be an impactful project; I considered doing a version with synthetic human feedback (showing that we could learn from a practical amount of algorithmically-defined feedback) but my manager Dario Amodei convinced me it would be more compelling to immediately go for human feedback. The initial project was surprisingly successful and published here. I then intended to implement a version with language models aiming to be complete in the first half of 2018 (aiming to build an initial amplification prototype with LMs around end of 2018; both of these timelines were about 2.5x too optimistic). This seemed like the most important domain to study RLHF and alignment more broadly. In mid-2017 Alec Radford helped me do a prototype with LSTM language models (prior to the release of transformers); the prototype didn’t look promising enough to scale up. In mid-2017 Geoffrey Irving joined OpenAI and was excited about starting with RLHF and then going beyond it using debate; he also thought language models were the most important domain to study and had more conviction about that. In 2018 he started a larger team working on fine-tuning on language models, w...]]>
paulfchristiano https://www.lesswrong.com/posts/vwu4kegAEZTBtpT6p/thoughts-on-the-impact-of-rlhf-research Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on the impact of RLHF research, published by paulfchristiano on January 25, 2023 on LessWrong. In this post I’m going to describe my basic justification for working on RLHF in 2017-2020, which I still stand behind. I’ll discuss various arguments that RLHF research had an overall negative impact and explain why I don’t find them persuasive. I'll also clarify that I don't think research on RLHF is automatically net positive; alignment research should address real alignment problems, and we should reject a vague association between "RLHF progress" and "alignment progress." Background on my involvement in RLHF work Here are some background views about alignment I held in 2015 and still hold today. I expect disagreements about RLHF will come down to disagreements about this background: The simplest plausible strategies for alignment involve humans (maybe with the assistance of AI systems) evaluating a model’s actions based on how much we expect to like their consequences, and then training the models to produce highly-evaluated actions. (This is in contrast with, for example, trying to formally specify the human utility function, or notions of corrigibility / low-impact / etc, in some way.) Simple versions of this approach are expected to run into difficulties, and potentially to be totally unworkable, because: Evaluating consequences is hard. A treacherous turn can cause trouble too quickly to detect or correct even if you are able to do so, and it’s challenging to evaluate treacherous turn probability at training time. It’s very unclear if those issues are fatal before or after AI systems are powerful enough to completely transform human society (and in particular the state of AI alignment). Even if they are fatal, many of the approaches to resolving them still have the same basic structure of learning from expensive evaluations of actions. In order to overcome the fundamental difficulties with RLHF, I have long been interested in techniques like iterated amplification and adversarial training. However, prior to 2017 most researchers I talked to in ML (and many researchers in alignment) thought that the basic strategy of training AI with expensive human evaluations was impractical for more boring reasons and so weren't interested in these difficulties. On top of that, we obviously weren’t able to actually implement anything more fancy than RLHF since all of these methods involve learning from expensive feedback. I worked on RLHF work to try to facilitate and motivate work on fixes. The history of my involvement: My first post on this topic was in 2015. When I started full-time at OpenAI in 2017 it seemed to me like it would be an impactful project; I considered doing a version with synthetic human feedback (showing that we could learn from a practical amount of algorithmically-defined feedback) but my manager Dario Amodei convinced me it would be more compelling to immediately go for human feedback. The initial project was surprisingly successful and published here. I then intended to implement a version with language models aiming to be complete in the first half of 2018 (aiming to build an initial amplification prototype with LMs around end of 2018; both of these timelines were about 2.5x too optimistic). This seemed like the most important domain to study RLHF and alignment more broadly. In mid-2017 Alec Radford helped me do a prototype with LSTM language models (prior to the release of transformers); the prototype didn’t look promising enough to scale up. In mid-2017 Geoffrey Irving joined OpenAI and was excited about starting with RLHF and then going beyond it using debate; he also thought language models were the most important domain to study and had more conviction about that. In 2018 he started a larger team working on fine-tuning on language models, w...]]>
Wed, 25 Jan 2023 18:15:46 +0000 LW - Thoughts on the impact of RLHF research by paulfchristiano Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on the impact of RLHF research, published by paulfchristiano on January 25, 2023 on LessWrong. In this post I’m going to describe my basic justification for working on RLHF in 2017-2020, which I still stand behind. I’ll discuss various arguments that RLHF research had an overall negative impact and explain why I don’t find them persuasive. I'll also clarify that I don't think research on RLHF is automatically net positive; alignment research should address real alignment problems, and we should reject a vague association between "RLHF progress" and "alignment progress." Background on my involvement in RLHF work Here are some background views about alignment I held in 2015 and still hold today. I expect disagreements about RLHF will come down to disagreements about this background: The simplest plausible strategies for alignment involve humans (maybe with the assistance of AI systems) evaluating a model’s actions based on how much we expect to like their consequences, and then training the models to produce highly-evaluated actions. (This is in contrast with, for example, trying to formally specify the human utility function, or notions of corrigibility / low-impact / etc, in some way.) Simple versions of this approach are expected to run into difficulties, and potentially to be totally unworkable, because: Evaluating consequences is hard. A treacherous turn can cause trouble too quickly to detect or correct even if you are able to do so, and it’s challenging to evaluate treacherous turn probability at training time. It’s very unclear if those issues are fatal before or after AI systems are powerful enough to completely transform human society (and in particular the state of AI alignment). Even if they are fatal, many of the approaches to resolving them still have the same basic structure of learning from expensive evaluations of actions. In order to overcome the fundamental difficulties with RLHF, I have long been interested in techniques like iterated amplification and adversarial training. However, prior to 2017 most researchers I talked to in ML (and many researchers in alignment) thought that the basic strategy of training AI with expensive human evaluations was impractical for more boring reasons and so weren't interested in these difficulties. On top of that, we obviously weren’t able to actually implement anything more fancy than RLHF since all of these methods involve learning from expensive feedback. I worked on RLHF work to try to facilitate and motivate work on fixes. The history of my involvement: My first post on this topic was in 2015. When I started full-time at OpenAI in 2017 it seemed to me like it would be an impactful project; I considered doing a version with synthetic human feedback (showing that we could learn from a practical amount of algorithmically-defined feedback) but my manager Dario Amodei convinced me it would be more compelling to immediately go for human feedback. The initial project was surprisingly successful and published here. I then intended to implement a version with language models aiming to be complete in the first half of 2018 (aiming to build an initial amplification prototype with LMs around end of 2018; both of these timelines were about 2.5x too optimistic). This seemed like the most important domain to study RLHF and alignment more broadly. In mid-2017 Alec Radford helped me do a prototype with LSTM language models (prior to the release of transformers); the prototype didn’t look promising enough to scale up. In mid-2017 Geoffrey Irving joined OpenAI and was excited about starting with RLHF and then going beyond it using debate; he also thought language models were the most important domain to study and had more conviction about that. In 2018 he started a larger team working on fine-tuning on language models, w...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on the impact of RLHF research, published by paulfchristiano on January 25, 2023 on LessWrong. In this post I’m going to describe my basic justification for working on RLHF in 2017-2020, which I still stand behind. I’ll discuss various arguments that RLHF research had an overall negative impact and explain why I don’t find them persuasive. I'll also clarify that I don't think research on RLHF is automatically net positive; alignment research should address real alignment problems, and we should reject a vague association between "RLHF progress" and "alignment progress." Background on my involvement in RLHF work Here are some background views about alignment I held in 2015 and still hold today. I expect disagreements about RLHF will come down to disagreements about this background: The simplest plausible strategies for alignment involve humans (maybe with the assistance of AI systems) evaluating a model’s actions based on how much we expect to like their consequences, and then training the models to produce highly-evaluated actions. (This is in contrast with, for example, trying to formally specify the human utility function, or notions of corrigibility / low-impact / etc, in some way.) Simple versions of this approach are expected to run into difficulties, and potentially to be totally unworkable, because: Evaluating consequences is hard. A treacherous turn can cause trouble too quickly to detect or correct even if you are able to do so, and it’s challenging to evaluate treacherous turn probability at training time. It’s very unclear if those issues are fatal before or after AI systems are powerful enough to completely transform human society (and in particular the state of AI alignment). Even if they are fatal, many of the approaches to resolving them still have the same basic structure of learning from expensive evaluations of actions. In order to overcome the fundamental difficulties with RLHF, I have long been interested in techniques like iterated amplification and adversarial training. However, prior to 2017 most researchers I talked to in ML (and many researchers in alignment) thought that the basic strategy of training AI with expensive human evaluations was impractical for more boring reasons and so weren't interested in these difficulties. On top of that, we obviously weren’t able to actually implement anything more fancy than RLHF since all of these methods involve learning from expensive feedback. I worked on RLHF work to try to facilitate and motivate work on fixes. The history of my involvement: My first post on this topic was in 2015. When I started full-time at OpenAI in 2017 it seemed to me like it would be an impactful project; I considered doing a version with synthetic human feedback (showing that we could learn from a practical amount of algorithmically-defined feedback) but my manager Dario Amodei convinced me it would be more compelling to immediately go for human feedback. The initial project was surprisingly successful and published here. I then intended to implement a version with language models aiming to be complete in the first half of 2018 (aiming to build an initial amplification prototype with LMs around end of 2018; both of these timelines were about 2.5x too optimistic). This seemed like the most important domain to study RLHF and alignment more broadly. In mid-2017 Alec Radford helped me do a prototype with LSTM language models (prior to the release of transformers); the prototype didn’t look promising enough to scale up. In mid-2017 Geoffrey Irving joined OpenAI and was excited about starting with RLHF and then going beyond it using debate; he also thought language models were the most important domain to study and had more conviction about that. In 2018 he started a larger team working on fine-tuning on language models, w...]]>
paulfchristiano https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 14:28 None full 4604
nWCokT9xbrY4p98co_LW-week LW - "Heretical Thoughts on AI" by Eli Dourado by DragonGod Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Heretical Thoughts on AI" by Eli Dourado, published by DragonGod on January 19, 2023 on LessWrong. Abstract Eli Dourado presents the case for scepticism that AI will be economically transformative near term. For a summary and or exploration of implications, skip to "My Take". Introduction Fool me once. In 1987, Robert Solow quipped, “You can see the computer age everywhere but in the productivity statistics.” Incredibly, this observation happened before the introduction of the commercial Internet and smartphones, and yet it holds to this day. Despite a brief spasm of total factor productivity growth from 1995 to 2005 (arguably due to the economic opening of China, not to digital technology), growth since then has been dismal. In productivity terms, for the United States, the smartphone era has been the most economically stagnant period of the last century. In some European countries, total factor productivity is actually declining. Eli's Thesis In particular, he advances the following sectors as areas AI will fail to revolutionise: Housing Most housing challenges are due to land use policy specifically Housing factors through virtually all sectors of the economy He points out that the internet did not break up the real estate agent cartel (despite his initial expectations to the contrary) Energy Regulatory hurdles to deployment There are AI optimisation opportunities elsewhere in the energy pipeline, but the regulatory hurdles could bottleneck the economic productivity gains Transportation The issues with US transportation infrastructure have little to do with technology and are more regulatory in nature As for energy, there are optimisation opportunities for digital tools, but the non-digital issues will be the bottleneck Health > The biggest gain from AI in medicine would be if it could help us get drugs to market at lower cost. The cost of clinical trials is out of control—up from $10,000 per patient to $500,000 per patient, according to STAT. The majority of this increase is due to industry dysfunction. Synthesis: I’ll stop there. OK, so that’s only four industries, but they are big ones. They are industries whose biggest bottlenecks weren’t addressed by computers, the Internet, and mobile devices. That is why broad-based economic stagnation has occurred in spite of impressive gains in IT. If we don’t improve land use regulation, or remove the obstacles to deploying energy and transportation projects, or make clinical trials more cost-effective—if we don’t do the grueling, messy, human work of national, local, or internal politics—then no matter how good AI models get, the Great Stagnation will continue. We will see the machine learning age, to paraphrase Solow, everywhere but in the productivity statistics. Eli thinks AI will be very transformative for content generation, but that transformation may not be particularly felt in people's lives. Its economic impact will be even smaller (emphasis mine): Even if AI dramatically increases media output and it’s all high quality and there are no negative consequences, the effect on aggregate productivity is limited by the size of the media market, which is perhaps 2 percent of global GDP. If we want to really end the Great Stagnation, we need to disrupt some bigger industries. A personal anecdote of his that I found pertinent enough to include in full: I could be wrong. I remember the first time I watched what could be called an online video. As I recall, the first video-capable version of RealPlayer shipped with Windows 98. People said that online video streaming was the future. Teenage Eli fired up Windows 98 to evaluate this claim. I opened RealPlayer and streamed a demo clip over my dial-up modem. The quality was abysmal. It was a clip of a guy surfing, and over the modem and with a struggling CPU I got about 1 fra...]]>
DragonGod https://www.lesswrong.com/posts/nWCokT9xbrY4p98co/heretical-thoughts-on-ai-by-eli-dourado Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Heretical Thoughts on AI" by Eli Dourado, published by DragonGod on January 19, 2023 on LessWrong. Abstract Eli Dourado presents the case for scepticism that AI will be economically transformative near term. For a summary and or exploration of implications, skip to "My Take". Introduction Fool me once. In 1987, Robert Solow quipped, “You can see the computer age everywhere but in the productivity statistics.” Incredibly, this observation happened before the introduction of the commercial Internet and smartphones, and yet it holds to this day. Despite a brief spasm of total factor productivity growth from 1995 to 2005 (arguably due to the economic opening of China, not to digital technology), growth since then has been dismal. In productivity terms, for the United States, the smartphone era has been the most economically stagnant period of the last century. In some European countries, total factor productivity is actually declining. Eli's Thesis In particular, he advances the following sectors as areas AI will fail to revolutionise: Housing Most housing challenges are due to land use policy specifically Housing factors through virtually all sectors of the economy He points out that the internet did not break up the real estate agent cartel (despite his initial expectations to the contrary) Energy Regulatory hurdles to deployment There are AI optimisation opportunities elsewhere in the energy pipeline, but the regulatory hurdles could bottleneck the economic productivity gains Transportation The issues with US transportation infrastructure have little to do with technology and are more regulatory in nature As for energy, there are optimisation opportunities for digital tools, but the non-digital issues will be the bottleneck Health > The biggest gain from AI in medicine would be if it could help us get drugs to market at lower cost. The cost of clinical trials is out of control—up from $10,000 per patient to $500,000 per patient, according to STAT. The majority of this increase is due to industry dysfunction. Synthesis: I’ll stop there. OK, so that’s only four industries, but they are big ones. They are industries whose biggest bottlenecks weren’t addressed by computers, the Internet, and mobile devices. That is why broad-based economic stagnation has occurred in spite of impressive gains in IT. If we don’t improve land use regulation, or remove the obstacles to deploying energy and transportation projects, or make clinical trials more cost-effective—if we don’t do the grueling, messy, human work of national, local, or internal politics—then no matter how good AI models get, the Great Stagnation will continue. We will see the machine learning age, to paraphrase Solow, everywhere but in the productivity statistics. Eli thinks AI will be very transformative for content generation, but that transformation may not be particularly felt in people's lives. Its economic impact will be even smaller (emphasis mine): Even if AI dramatically increases media output and it’s all high quality and there are no negative consequences, the effect on aggregate productivity is limited by the size of the media market, which is perhaps 2 percent of global GDP. If we want to really end the Great Stagnation, we need to disrupt some bigger industries. A personal anecdote of his that I found pertinent enough to include in full: I could be wrong. I remember the first time I watched what could be called an online video. As I recall, the first video-capable version of RealPlayer shipped with Windows 98. People said that online video streaming was the future. Teenage Eli fired up Windows 98 to evaluate this claim. I opened RealPlayer and streamed a demo clip over my dial-up modem. The quality was abysmal. It was a clip of a guy surfing, and over the modem and with a struggling CPU I got about 1 fra...]]>
Thu, 19 Jan 2023 17:47:44 +0000 LW - "Heretical Thoughts on AI" by Eli Dourado by DragonGod Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Heretical Thoughts on AI" by Eli Dourado, published by DragonGod on January 19, 2023 on LessWrong. Abstract Eli Dourado presents the case for scepticism that AI will be economically transformative near term. For a summary and or exploration of implications, skip to "My Take". Introduction Fool me once. In 1987, Robert Solow quipped, “You can see the computer age everywhere but in the productivity statistics.” Incredibly, this observation happened before the introduction of the commercial Internet and smartphones, and yet it holds to this day. Despite a brief spasm of total factor productivity growth from 1995 to 2005 (arguably due to the economic opening of China, not to digital technology), growth since then has been dismal. In productivity terms, for the United States, the smartphone era has been the most economically stagnant period of the last century. In some European countries, total factor productivity is actually declining. Eli's Thesis In particular, he advances the following sectors as areas AI will fail to revolutionise: Housing Most housing challenges are due to land use policy specifically Housing factors through virtually all sectors of the economy He points out that the internet did not break up the real estate agent cartel (despite his initial expectations to the contrary) Energy Regulatory hurdles to deployment There are AI optimisation opportunities elsewhere in the energy pipeline, but the regulatory hurdles could bottleneck the economic productivity gains Transportation The issues with US transportation infrastructure have little to do with technology and are more regulatory in nature As for energy, there are optimisation opportunities for digital tools, but the non-digital issues will be the bottleneck Health > The biggest gain from AI in medicine would be if it could help us get drugs to market at lower cost. The cost of clinical trials is out of control—up from $10,000 per patient to $500,000 per patient, according to STAT. The majority of this increase is due to industry dysfunction. Synthesis: I’ll stop there. OK, so that’s only four industries, but they are big ones. They are industries whose biggest bottlenecks weren’t addressed by computers, the Internet, and mobile devices. That is why broad-based economic stagnation has occurred in spite of impressive gains in IT. If we don’t improve land use regulation, or remove the obstacles to deploying energy and transportation projects, or make clinical trials more cost-effective—if we don’t do the grueling, messy, human work of national, local, or internal politics—then no matter how good AI models get, the Great Stagnation will continue. We will see the machine learning age, to paraphrase Solow, everywhere but in the productivity statistics. Eli thinks AI will be very transformative for content generation, but that transformation may not be particularly felt in people's lives. Its economic impact will be even smaller (emphasis mine): Even if AI dramatically increases media output and it’s all high quality and there are no negative consequences, the effect on aggregate productivity is limited by the size of the media market, which is perhaps 2 percent of global GDP. If we want to really end the Great Stagnation, we need to disrupt some bigger industries. A personal anecdote of his that I found pertinent enough to include in full: I could be wrong. I remember the first time I watched what could be called an online video. As I recall, the first video-capable version of RealPlayer shipped with Windows 98. People said that online video streaming was the future. Teenage Eli fired up Windows 98 to evaluate this claim. I opened RealPlayer and streamed a demo clip over my dial-up modem. The quality was abysmal. It was a clip of a guy surfing, and over the modem and with a struggling CPU I got about 1 fra...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Heretical Thoughts on AI" by Eli Dourado, published by DragonGod on January 19, 2023 on LessWrong. Abstract Eli Dourado presents the case for scepticism that AI will be economically transformative near term. For a summary and or exploration of implications, skip to "My Take". Introduction Fool me once. In 1987, Robert Solow quipped, “You can see the computer age everywhere but in the productivity statistics.” Incredibly, this observation happened before the introduction of the commercial Internet and smartphones, and yet it holds to this day. Despite a brief spasm of total factor productivity growth from 1995 to 2005 (arguably due to the economic opening of China, not to digital technology), growth since then has been dismal. In productivity terms, for the United States, the smartphone era has been the most economically stagnant period of the last century. In some European countries, total factor productivity is actually declining. Eli's Thesis In particular, he advances the following sectors as areas AI will fail to revolutionise: Housing Most housing challenges are due to land use policy specifically Housing factors through virtually all sectors of the economy He points out that the internet did not break up the real estate agent cartel (despite his initial expectations to the contrary) Energy Regulatory hurdles to deployment There are AI optimisation opportunities elsewhere in the energy pipeline, but the regulatory hurdles could bottleneck the economic productivity gains Transportation The issues with US transportation infrastructure have little to do with technology and are more regulatory in nature As for energy, there are optimisation opportunities for digital tools, but the non-digital issues will be the bottleneck Health > The biggest gain from AI in medicine would be if it could help us get drugs to market at lower cost. The cost of clinical trials is out of control—up from $10,000 per patient to $500,000 per patient, according to STAT. The majority of this increase is due to industry dysfunction. Synthesis: I’ll stop there. OK, so that’s only four industries, but they are big ones. They are industries whose biggest bottlenecks weren’t addressed by computers, the Internet, and mobile devices. That is why broad-based economic stagnation has occurred in spite of impressive gains in IT. If we don’t improve land use regulation, or remove the obstacles to deploying energy and transportation projects, or make clinical trials more cost-effective—if we don’t do the grueling, messy, human work of national, local, or internal politics—then no matter how good AI models get, the Great Stagnation will continue. We will see the machine learning age, to paraphrase Solow, everywhere but in the productivity statistics. Eli thinks AI will be very transformative for content generation, but that transformation may not be particularly felt in people's lives. Its economic impact will be even smaller (emphasis mine): Even if AI dramatically increases media output and it’s all high quality and there are no negative consequences, the effect on aggregate productivity is limited by the size of the media market, which is perhaps 2 percent of global GDP. If we want to really end the Great Stagnation, we need to disrupt some bigger industries. A personal anecdote of his that I found pertinent enough to include in full: I could be wrong. I remember the first time I watched what could be called an online video. As I recall, the first video-capable version of RealPlayer shipped with Windows 98. People said that online video streaming was the future. Teenage Eli fired up Windows 98 to evaluate this claim. I opened RealPlayer and streamed a demo clip over my dial-up modem. The quality was abysmal. It was a clip of a guy surfing, and over the modem and with a struggling CPU I got about 1 fra...]]>
DragonGod https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 07:26 None full 4514
BpTDJj6TrqGYTjFcZ_LW-week LW - A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX by jacobjacob Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX, published by jacobjacob on September 1, 2023 on LessWrong. Patrick Collison has a fantastic list of examples of people quickly accomplishing ambitious things together since the 19th Century. It does make you yearn for a time that feels... different, when the lethargic behemoths of government departments could move at the speed of a racing startup: [...] last century, [the Department of Defense] innovated at a speed that puts modern Silicon Valley startups to shame: the Pentagon was built in only 16 months (1941-1943), the Manhattan Project ran for just over 3 years (1942-1946), and the Apollo Program put a man on the moon in under a decade (1961-1969). In the 1950s alone, the United States built five generations of fighter jets, three generations of manned bombers, two classes of aircraft carriers, submarine-launched ballistic missiles, and nuclear-powered attack submarines. [Note: that paragraph is from a different post.] Inspired by partly by Patrick's list, I spent some of my vacation reading and learning about various projects from this Lost Age. I then wrote up a memo to share highlights and excerpts with my colleagues at Lightcone. After that, some people encouraged me to share the memo more widely -- and I do think it's of interest to anyone who harbors an ambition for greatness and a curiosity about operating effectively. How do you build the world's tallest building in only a year? The world's largest building in the same amount of time? Or America's first fighter jet in just 6 months? How?? Writing this post felt like it helped me gain at least some pieces of this puzzle. If anyone has additional pieces, I'd love to hear them in the comments. Empire State Building The Empire State was the tallest building in the world upon completion in April 1931. Over my vacation I read a rediscovered 1930s notebook, written by the general contractors themselves. It details the construction process and the organisation of the project. I will share some excerpts, but to contextualize them, consider first some other skyscrapers built more recently: Design startConstruction endTotal timeBurj Khalifa200420106 yearsShanghai Tower200820157 yearsAbraj Al-Balt2002201210 yearsOne World Trade Center200520149 yearsNordstrom Tower2010202010 yearsTaipei 101199720047 years (list from skyscrapercenter.com) Now, from the Empire State book's foreword: The most astonishing statistics of the Empire State was the extraordinary speed with which it was planned and constructed. [...] There are different ways to describe this feat. Six months after the setting of the first structural columns on April 7, 1930, the steel frame topped off on the eighty-sixth floor. The fully enclosed building, including the mooring mast that raised its height to the equivalent of 102 stories, was finished in eleven months, in March 1931. Most amazing though, is the fact that within just twenty months -- from the first signed contractors with the architects in September 1929 to opening-day ceremonies on May 1, 1931 -- the Empire State was designed, engineered, erected, and ready for tenants. Within this time, the architectural drawings and plans were prepared, the Vicitorian pile of the Waldorf-Astoria hotel was demolished [demolition started only two days after the initial agreement was signed], the foundations and grillages were dug and set, the steel columns and beams, some 57,000 tons, were fabricated and milled to precise specifications, ten million common bricks were laid, more than 62,000 cubic yards of concrete were poured, 6,400 windows were set, and sixty-seven elevators were installed in seven miles of shafts. At peak activity, 3,500 workers were employed on site, and the frame rose more than a story a day,...]]>
jacobjacob https://www.lesswrong.com/posts/BpTDJj6TrqGYTjFcZ/a-golden-age-of-building-excerpts-and-lessons-from-empire Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX, published by jacobjacob on September 1, 2023 on LessWrong. Patrick Collison has a fantastic list of examples of people quickly accomplishing ambitious things together since the 19th Century. It does make you yearn for a time that feels... different, when the lethargic behemoths of government departments could move at the speed of a racing startup: [...] last century, [the Department of Defense] innovated at a speed that puts modern Silicon Valley startups to shame: the Pentagon was built in only 16 months (1941-1943), the Manhattan Project ran for just over 3 years (1942-1946), and the Apollo Program put a man on the moon in under a decade (1961-1969). In the 1950s alone, the United States built five generations of fighter jets, three generations of manned bombers, two classes of aircraft carriers, submarine-launched ballistic missiles, and nuclear-powered attack submarines. [Note: that paragraph is from a different post.] Inspired by partly by Patrick's list, I spent some of my vacation reading and learning about various projects from this Lost Age. I then wrote up a memo to share highlights and excerpts with my colleagues at Lightcone. After that, some people encouraged me to share the memo more widely -- and I do think it's of interest to anyone who harbors an ambition for greatness and a curiosity about operating effectively. How do you build the world's tallest building in only a year? The world's largest building in the same amount of time? Or America's first fighter jet in just 6 months? How?? Writing this post felt like it helped me gain at least some pieces of this puzzle. If anyone has additional pieces, I'd love to hear them in the comments. Empire State Building The Empire State was the tallest building in the world upon completion in April 1931. Over my vacation I read a rediscovered 1930s notebook, written by the general contractors themselves. It details the construction process and the organisation of the project. I will share some excerpts, but to contextualize them, consider first some other skyscrapers built more recently: Design startConstruction endTotal timeBurj Khalifa200420106 yearsShanghai Tower200820157 yearsAbraj Al-Balt2002201210 yearsOne World Trade Center200520149 yearsNordstrom Tower2010202010 yearsTaipei 101199720047 years (list from skyscrapercenter.com) Now, from the Empire State book's foreword: The most astonishing statistics of the Empire State was the extraordinary speed with which it was planned and constructed. [...] There are different ways to describe this feat. Six months after the setting of the first structural columns on April 7, 1930, the steel frame topped off on the eighty-sixth floor. The fully enclosed building, including the mooring mast that raised its height to the equivalent of 102 stories, was finished in eleven months, in March 1931. Most amazing though, is the fact that within just twenty months -- from the first signed contractors with the architects in September 1929 to opening-day ceremonies on May 1, 1931 -- the Empire State was designed, engineered, erected, and ready for tenants. Within this time, the architectural drawings and plans were prepared, the Vicitorian pile of the Waldorf-Astoria hotel was demolished [demolition started only two days after the initial agreement was signed], the foundations and grillages were dug and set, the steel columns and beams, some 57,000 tons, were fabricated and milled to precise specifications, ten million common bricks were laid, more than 62,000 cubic yards of concrete were poured, 6,400 windows were set, and sixty-seven elevators were installed in seven miles of shafts. At peak activity, 3,500 workers were employed on site, and the frame rose more than a story a day,...]]>
Fri, 01 Sep 2023 05:30:30 +0000 LW - A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX by jacobjacob Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX, published by jacobjacob on September 1, 2023 on LessWrong. Patrick Collison has a fantastic list of examples of people quickly accomplishing ambitious things together since the 19th Century. It does make you yearn for a time that feels... different, when the lethargic behemoths of government departments could move at the speed of a racing startup: [...] last century, [the Department of Defense] innovated at a speed that puts modern Silicon Valley startups to shame: the Pentagon was built in only 16 months (1941-1943), the Manhattan Project ran for just over 3 years (1942-1946), and the Apollo Program put a man on the moon in under a decade (1961-1969). In the 1950s alone, the United States built five generations of fighter jets, three generations of manned bombers, two classes of aircraft carriers, submarine-launched ballistic missiles, and nuclear-powered attack submarines. [Note: that paragraph is from a different post.] Inspired by partly by Patrick's list, I spent some of my vacation reading and learning about various projects from this Lost Age. I then wrote up a memo to share highlights and excerpts with my colleagues at Lightcone. After that, some people encouraged me to share the memo more widely -- and I do think it's of interest to anyone who harbors an ambition for greatness and a curiosity about operating effectively. How do you build the world's tallest building in only a year? The world's largest building in the same amount of time? Or America's first fighter jet in just 6 months? How?? Writing this post felt like it helped me gain at least some pieces of this puzzle. If anyone has additional pieces, I'd love to hear them in the comments. Empire State Building The Empire State was the tallest building in the world upon completion in April 1931. Over my vacation I read a rediscovered 1930s notebook, written by the general contractors themselves. It details the construction process and the organisation of the project. I will share some excerpts, but to contextualize them, consider first some other skyscrapers built more recently: Design startConstruction endTotal timeBurj Khalifa200420106 yearsShanghai Tower200820157 yearsAbraj Al-Balt2002201210 yearsOne World Trade Center200520149 yearsNordstrom Tower2010202010 yearsTaipei 101199720047 years (list from skyscrapercenter.com) Now, from the Empire State book's foreword: The most astonishing statistics of the Empire State was the extraordinary speed with which it was planned and constructed. [...] There are different ways to describe this feat. Six months after the setting of the first structural columns on April 7, 1930, the steel frame topped off on the eighty-sixth floor. The fully enclosed building, including the mooring mast that raised its height to the equivalent of 102 stories, was finished in eleven months, in March 1931. Most amazing though, is the fact that within just twenty months -- from the first signed contractors with the architects in September 1929 to opening-day ceremonies on May 1, 1931 -- the Empire State was designed, engineered, erected, and ready for tenants. Within this time, the architectural drawings and plans were prepared, the Vicitorian pile of the Waldorf-Astoria hotel was demolished [demolition started only two days after the initial agreement was signed], the foundations and grillages were dug and set, the steel columns and beams, some 57,000 tons, were fabricated and milled to precise specifications, ten million common bricks were laid, more than 62,000 cubic yards of concrete were poured, 6,400 windows were set, and sixty-seven elevators were installed in seven miles of shafts. At peak activity, 3,500 workers were employed on site, and the frame rose more than a story a day,...]]>
Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX, published by jacobjacob on September 1, 2023 on LessWrong. Patrick Collison has a fantastic list of examples of people quickly accomplishing ambitious things together since the 19th Century. It does make you yearn for a time that feels... different, when the lethargic behemoths of government departments could move at the speed of a racing startup: [...] last century, [the Department of Defense] innovated at a speed that puts modern Silicon Valley startups to shame: the Pentagon was built in only 16 months (1941-1943), the Manhattan Project ran for just over 3 years (1942-1946), and the Apollo Program put a man on the moon in under a decade (1961-1969). In the 1950s alone, the United States built five generations of fighter jets, three generations of manned bombers, two classes of aircraft carriers, submarine-launched ballistic missiles, and nuclear-powered attack submarines. [Note: that paragraph is from a different post.] Inspired by partly by Patrick's list, I spent some of my vacation reading and learning about various projects from this Lost Age. I then wrote up a memo to share highlights and excerpts with my colleagues at Lightcone. After that, some people encouraged me to share the memo more widely -- and I do think it's of interest to anyone who harbors an ambition for greatness and a curiosity about operating effectively. How do you build the world's tallest building in only a year? The world's largest building in the same amount of time? Or America's first fighter jet in just 6 months? How?? Writing this post felt like it helped me gain at least some pieces of this puzzle. If anyone has additional pieces, I'd love to hear them in the comments. Empire State Building The Empire State was the tallest building in the world upon completion in April 1931. Over my vacation I read a rediscovered 1930s notebook, written by the general contractors themselves. It details the construction process and the organisation of the project. I will share some excerpts, but to contextualize them, consider first some other skyscrapers built more recently: Design startConstruction endTotal timeBurj Khalifa200420106 yearsShanghai Tower200820157 yearsAbraj Al-Balt2002201210 yearsOne World Trade Center200520149 yearsNordstrom Tower2010202010 yearsTaipei 101199720047 years (list from skyscrapercenter.com) Now, from the Empire State book's foreword: The most astonishing statistics of the Empire State was the extraordinary speed with which it was planned and constructed. [...] There are different ways to describe this feat. Six months after the setting of the first structural columns on April 7, 1930, the steel frame topped off on the eighty-sixth floor. The fully enclosed building, including the mooring mast that raised its height to the equivalent of 102 stories, was finished in eleven months, in March 1931. Most amazing though, is the fact that within just twenty months -- from the first signed contractors with the architects in September 1929 to opening-day ceremonies on May 1, 1931 -- the Empire State was designed, engineered, erected, and ready for tenants. Within this time, the architectural drawings and plans were prepared, the Vicitorian pile of the Waldorf-Astoria hotel was demolished [demolition started only two days after the initial agreement was signed], the foundations and grillages were dug and set, the steel columns and beams, some 57,000 tons, were fabricated and milled to precise specifications, ten million common bricks were laid, more than 62,000 cubic yards of concrete were poured, 6,400 windows were set, and sixty-seven elevators were installed in seven miles of shafts. At peak activity, 3,500 workers were employed on site, and the frame rose more than a story a day,...]]>
jacobjacob https://storage.googleapis.com/rssfile/images/Nonlinear%20Logo%203000x3000%20-%20LessWrong%20Weekly.png 39:10 None full 7003