+ All Categories
Home > Documents > Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9...

Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9...

Date post: 14-Jul-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
58
Artificial Intelligence and Its Implications for Future Suffering Brian Tomasik Center on Long-Term Risk [email protected] June 2016 * Abstract Artificial intelligence (AI) will transform the world later this century. I expect this transition will be a "soft takeoff" in which many sectors of society update together in response to incremental AI developments, though the possibility of a harder takeoff in which a single AI project "goes foom" shouldn’t be ruled out. If a rogue AI gained control of Earth, it would proceed to accomplish its goals by colonizing the galaxy and undertaking some very interesting achievements in science and engineering. On the other hand, it would not necessarily respect human values, including the value of preventing the suffering of less powerful creatures. Whether a rogue-AI scenario would entail more expected suffering than other scenarios is a question to explore further. Regardless, the field of AI ethics and policy seems to be a very important space where altruists can make a positive-sum impact along many dimensions. Expanding dialogue and challenging us-vs.-them prejudices could be valuable. Contents 1 Summary 3 2 Introduction 3 3 Is "the singularity" crazy? 4 4 The singularity is more than AI 5 5 Will society realize the importance of AI? 6 6 A soft takeoff seems more likely? 6 7 Intelligence explosion? 10 8 Reply to Bostrom’s arguments for a hard takeoff 12 * First written: May 2014; last modified: Jun. 2016 1
Transcript
Page 1: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implicationsfor Future Suffering

Brian TomasikCenter on Long-Term Risk

[email protected]

June 2016∗

AbstractArtificial intelligence (AI) will transform the world later this century. I expect thistransition will be a "soft takeoff" in which many sectors of society update together inresponse to incremental AI developments, though the possibility of a harder takeoffin which a single AI project "goes foom" shouldn’t be ruled out. If a rogue AI gainedcontrol of Earth, it would proceed to accomplish its goals by colonizing the galaxyand undertaking some very interesting achievements in science and engineering. Onthe other hand, it would not necessarily respect human values, including the value ofpreventing the suffering of less powerful creatures. Whether a rogue-AI scenario wouldentail more expected suffering than other scenarios is a question to explore further.Regardless, the field of AI ethics and policy seems to be a very important spacewhere altruists can make a positive-sum impact along many dimensions. Expandingdialogue and challenging us-vs.-them prejudices could be valuable.

Contents1 Summary 3

2 Introduction 3

3 Is "the singularity" crazy? 4

4 The singularity is more than AI 5

5 Will society realize the importance of AI? 6

6 A soft takeoff seems more likely? 6

7 Intelligence explosion? 10

8 Reply to Bostrom’s arguments for a hard takeoff 12∗First written: May 2014; last modified: Jun. 2016

1

Page 2: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

9 How complex is the brain? 159.1 One basic algorithm? 159.2 Ontogenetic development 16

10 Brain quantity vs. quality 17

11 More impact in hard-takeoff scenarios? 17

12 Village idiot vs. Einstein 19

13 A case for epistemic modesty on AI timelines 20

14 Intelligent robots in your backyard 20

15 Is automation "for free"? 21

16 Caring about the AI’s goals 22

17 Rogue AI would not share our values 23

18 Would a human-inspired AI or rogue AI cause more suffering? 24

19 Would helper robots feel pain? 26

20 How accurate would simulations be? 27

21 Rogue AIs can take off slowly 28

22 Would superintelligences become existentialists? 29

23 AI epistemology 30

24 Artificial philosophers 31

25 Would all AIs colonize space? 31

26 Who will first develop human-level AI? 33

27 One hypothetical AI takeoff scenario 33

28 How do you socialize an AI? 3628.1 Treacherous turn 3928.2 Following role models? 40

29 AI superpowers? 40

30 How big would a superintelligence be? 41

31 Another hypothetical AI takeoff scenario 41

2

Page 3: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

32 AI: More like the economy than like robots? 42

33 Importance of whole-brain emulation 44

34 Why work against brain-emulation risks appeals to suffering reducers 45

35 Would emulation work accelerate neuromorphic AI? 45

36 Are neuromorphic or mathematical AIs more controllable? 46

37 Impacts of empathy for AIs 4737.1 Slower AGI development? 4737.2 Attitudes toward AGI control 48

38 Charities working on this issue 49

39 Is MIRI’s work too theoretical? 49

40 Next steps 51

41 Where to push for maximal impact? 52

42 Is it valuable to work at or influence an AGI company? 55

43 Should suffering reducers focus on AGI safety? 56

44 Acknowledgments 57

References 57

1 Summary

Artificial intelligence (AI) will transform theworld later this century. I expect this transi-tion will be a "soft takeoff" in which many sec-tors of society update together in response toincremental AI developments, though the pos-sibility of a harder takeoff in which a single AIproject "goes foom" shouldn’t be ruled out. Ifa rogue AI gained control of Earth, it wouldproceed to accomplish its goals by colonizingthe galaxy and undertaking some very inter-esting achievements in science and engineer-ing. On the other hand, it would not necessar-ily respect human values, including the value

of preventing the suffering of less powerfulcreatures. Whether a rogue-AI scenario wouldentail more expected suffering than other sce-narios is a question to explore further. Regard-less, the field of AI ethics and policy seems tobe a very important space where altruists canmake a positive-sum impact along many di-mensions. Expanding dialogue and challengingus-vs.-them prejudices could be valuable.

2 Introduction

This piece contains some observations on whatlooks to be potentially a coming machine rev-olution in Earth’s history. For general back-

3

Page 4: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

ground reading, a good place to start isWikipedia’s article on the technological sin-gularity.I am not an expert on all the arguments in

this field, and my views remain very open tochange with new information. In the face ofepistemic disagreements with other very smartobservers, it makes sense to grant some cre-dence to a variety of viewpoints. Each personbrings unique contributions to the discussionby virtue of his or her particular background,experience, and intuitions.To date, I have not found a detailed analysis

of how those who are moved more by prevent-ing suffering than by other values should ap-proach singularity issues. This seems to me aserious gap, and research on this topic deserveshigh priority. In general, it’s important to ex-pand discussion of singularity issues to en-compass a broader range of participants thanthe engineers, technophiles, and science-fictionnerds who have historically pioneered the field.I. J. Good (1982) observed: "The urgent

drives out the important, so there is not verymuch written about ethical machines". Fortu-nately, this may be changing.

3 Is "the singularity" crazy?

In fall 2005, a friend pointed me to RayKurzweil’s (2000) The Age of Spiritual Ma-chines. This was my first introduction to "sin-gularity" ideas, and I found the book prettyastonishing. At the same time, much of itseemed rather implausible to me. In line withthe attitudes of my peers, I assumed thatKurzweil was crazy and that while his ideasdeserved further inspection, they should notbe taken at face value.In 2006 I discovered Nick Bostrom and

Eliezer Yudkowsky, and I began to follow theorganization then called the Singularity Insti-tute for Artificial Intelligence (SIAI), which isnow MIRI. I took SIAI’s ideas more seriouslythan Kurzweil’s, but I remained embarrassed

to mention the organization because the firstword in SIAI’s name sets off "insanity alarms"in listeners.I began to study machine learning in order

to get a better grasp of the AI field, and in fall2007, I switched my college major to computerscience. As I read textbooks and papers aboutmachine learning, I felt as though "narrow AI"was very different from the strong-AI fantasiesthat people painted. "AI programs are just abunch of hacks," I thought. "This isn’t intelli-gence; it’s just people using computers to ma-nipulate data and perform optimization, andthey dress it up as ’AI’ to make it sound sexy."Machine learning in particular seemed to bejust a computer scientist’s version of statistics.Neural networks were just an elaborated formof logistic regression. There were stylistic dif-ferences, such as computer science’s focus oncross-validation and bootstrapping instead oftesting parametric models – made possible be-cause computers can run data-intensive oper-ations that were inaccessible to statisticians inthe 1800s. But overall, this work didn’t seemlike the kind of "real" intelligence that peopletalked about for general AI.This attitude began to change as I learned

more cognitive science. Before 2008, my ideasabout human cognition were vague. Like mostscience-literate people, I believed the brainwas a product of physical processes, includingfiring patterns of neurons. But I lacked furtherinsight into what the black box of brains mightcontain. This led me to be confused aboutwhat "free will" meant until mid-2008 andabout what "consciousness" meant until late2009. Cognitive science showed me that thebrain was in fact very much like a computer,at least in the sense of being a deterministicinformation-processing device with distinct al-gorithms and modules. When viewed up close,these algorithms could look as "dumb" asthe kinds of algorithms in narrow AI that Ihad previously dismissed as "not really in-telligence." Of course, animal brains combine

4

Page 5: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

these seemingly dumb subcomponents in daz-zlingly complex and robust ways, but I couldnow see that the difference between narrow AIand brains was a matter of degree rather thankind. It now seemed plausible that broad AIcould emerge from lots of work on narrow AIcombined with stitching the parts together inthe right ways.So the singularity idea of artificial general

intelligence seemed less crazy than it had ini-tially. This was one of the rare cases wherea bold claim turned out to look more prob-able on further examination; usually extraor-dinary claims lack much evidence and crum-ble on closer inspection. I now think it’s quitelikely (maybe ∼ 75%) that humans will pro-duce at least a human-level AI within the next∼ 300 years conditional on no major disasters(such as sustained world economic collapse,global nuclear war, large-scale nanotech war,etc.), and also ignoring anthropic considera-tions (Bostrom, 2010).

4 The singularity is more than AI

The "singularity" concept is broader than theprediction of strong AI and can refer to severaldistinct sub-meanings. Like with most ideas,there’s a lot of fantasy and exaggeration asso-ciated with "the singularity," but at least thecore idea that technology will progress at anaccelerating rate for some time to come, absentmajor setbacks, is not particularly controver-sial. Exponential growth is the standard modelin economics, and while this can’t continue for-ever, it has been a robust pattern throughouthuman and even pre-human history.MIRI emphasizes AI for a good reason: At

the end of the day, the long-term future ofour galaxy will be dictated by AI, not bybiotech, nanotech, or other lower-level sys-tems. AI is the "brains of the operation." Ofcourse, this doesn’t automatically imply thatAI should be the primary focus of our atten-tion. Maybe other revolutionary technologies

or social forces will come first and deservehigher priority. In practice, I think focusingon AI specifically seems quite important evenrelative to competing scenarios, but it’s goodto explore many areas in parallel to at least ashallow depth.In addition, I don’t see a sharp distinction

between "AI" and other fields. Progress inAI software relies heavily on computer hard-ware, and it depends at least a little bit onother fundamentals of computer science, likeprogramming languages, operating systems,distributed systems, and networks. AI alsoshares significant overlap with neuroscience;this is especially true if whole brain emula-tion arrives before bottom-up AI. And every-thing else in society matters a lot too: Howintelligent and engineering-oriented are citi-zens? How much do governments fund AI andcognitive-science research? (I’d encourage lessrather than more.) What kinds of military andcommercial applications are being developed?Are other industrial backbone components ofsociety stable? What memetic lenses does so-ciety have for understanding and grapplingwith these trends? And so on. The AI story ispart of a larger story of social and technologi-cal change, in which one part influences otherparts.Significant trends in AI may not look like

the AI we see in movies. They may not in-volve animal-like cognitive agents as muchas more "boring", business-oriented comput-ing systems. Some of the most transformativecomputer technologies in the period 2000-2014have been drones, smart phones, and socialnetworking. These all involve some AI, but theAI is mostly used as a component of a larger,non-AI system, in which many other facets ofsoftware engineering play at least as much ofa role.Nonetheless, it seems nearly inevitable to me

that digital intelligence in some form will even-tually leave biological humans in the dust, iftechnological progress continues without fal-

5

Page 6: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

tering. This is almost obvious when we zoomout and notice that the history of life on Earthconsists in one species outcompeting another,over and over again. Ecology’s competitive ex-clusion principle suggests that in the long run,either humans or machines will ultimately oc-cupy the role of the most intelligent beings onthe planet, since "When one species has eventhe slightest advantage or edge over anotherthen the one with the advantage will dominatein the long term."

5 Will society realize the importanceof AI?

The basic premise of superintelligent machineswho have different priorities than their cre-ators has been in public consciousness formany decades. Arguably even Frankenstein,published in 1818, expresses this basic idea,though more modern forms include 2001:A Space Odyssey (1968), The Terminator(1984), I, Robot (2004), and many more. Prob-ably most people in Western countries have atleast heard of these ideas if not watched orread pieces of fiction on the topic.So why do most people, including many of

society’s elites, ignore strong AI as a serious is-sue? One reason is just that the world is reallybig, and there are many important (and not-so-important) issues that demand attention.Many people think strong AI is too far off,and we should focus on nearer-term problems.In addition, it’s possible that science fictionitself is part of the reason: People may writeoff AI scenarios as "just science fiction," as Iwould have done prior to late 2005. (Of course,this is partly for good reason, since depictionsof AI in movies are usually very unrealistic.)Often, citing Hollywood is taken as a thought-stopping deflection of the possibility of AI get-ting out of control, without much in the way ofsubstantive argument to back up that stance.For example: "let’s please keep the discussionfirmly within the realm of reason and leave the

robot uprisings to Hollywood screenwriters."As AI progresses, I find it hard to imagine

that mainstream society will ignore the topicforever. Perhaps awareness will accrue grad-ually, or perhaps an AI Sputnik moment willtrigger an avalanche of interest. Stuart Russellexpects that

Just as nuclear fusion researchers considerthe problem of containment of fusion re-actions as one of the primary problems oftheir field, it seems inevitable that issuesof control and safety will become centralto AI as the field matures.

I think it’s likely that issues of AI policy willbe debated heavily in the coming decades, al-though it’s possible that AI will be like nu-clear weapons – something that everyone isafraid of but that countries can’t stop becauseof arms-race dynamics. So even if AI proceedsslowly, there’s probably value in thinking moreabout these issues well ahead of time, thoughI wouldn’t consider the counterfactual valueof doing so to be astronomical compared withother projects in part because society will pickup the slack as the topic becomes more promi-nent.[ Update, Feb. 2015 : I wrote the preced-

ing paragraphs mostly in May 2014, beforeNick Bostrom’s Superintelligence book was re-leased. Following Bostrom’s book, a wave ofdiscussion about AI risk emerged from ElonMusk, Stephen Hawking, Bill Gates, and manyothers. AI risk suddenly became a mainstreamtopic discussed by almost every major newsoutlet, at least with one or two articles. Thisforeshadows what we’ll see more of in the fu-ture. The outpouring of publicity for the AItopic happened far sooner than I imagined itwould.]

6 A soft takeoff seems more likely?

Various thinkers have debated the likelihoodof a "hard" takeoff – in which a single com-

6

Page 7: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

puter or set of computers rapidly becomes su-perintelligent on its own – compared with a"soft" takeoff – in which society as a wholeis transformed by AI in a more distributed,continuous fashion. "The Hanson-YudkowskyAI-Foom Debate" discusses this in great detail(Hanson & Yudkowsky, 2013). The topic hasalso been considered by many others, such asRamez Naam vs. William Hertling.For a long time I inclined toward Yud-

kowsky’s vision of AI, because I respect hisopinions and didn’t ponder the details tooclosely. This is also the more prototypical ex-ample of rebellious AI in science fiction. Inearly 2014, a friend of mine challenged thisview, noting that computing power is a severelimitation for human-level minds. My friendsuggested that AI advances would be slow andwould diffuse through society rather than re-maining in the hands of a single developerteam. As I’ve read more AI literature, I thinkthis soft-takeoff view is pretty likely to becorrect. Science is always a gradual process,and almost all AI innovations historically havemoved in tiny steps. I would guess that eventhe evolution of humans from their primateancestors was a "soft" takeoff in the sense thatno single son or daughter was vastly more in-telligent than his or her parents. The evolu-tion of technology in general has been fairlycontinuous. I probably agree with Paul Chris-tiano that "it is unlikely that there will berapid, discontinuous, and unanticipated devel-opments in AI that catapult it to superhumanlevels [...]."Of course, it’s not guaranteed that AI in-

novations will diffuse throughout society. Atsome point perhaps governments will take con-trol, in the style of the Manhattan Project,and they’ll keep the advances secret. But eventhen, I expect that the internal advances by

the research teams will add cognitive abilitiesin small steps. Even if you have a theoreti-cally optimal intelligence algorithm, it’s con-strained by computing resources, so you eitherneed lots of hardware or approximation hacks(or most likely both) before it can function ef-fectively in the high-dimensional state space ofthe real world, and this again implies a slowertrajectory. Marcus Hutter’s AIXI(tl) is an ex-ample of a theoretically optimal general intel-ligence, but most AI researchers feel it won’twork for artificial general intelligence (AGI)because it’s astronomically expensive to com-pute. Ben Goertzel explains: "I think that tellsyou something interesting. It tells you thatdealing with resource restrictions – with theboundedness of time and space resources – isactually critical to intelligence. If you lift therestriction to do things efficiently, then AI andAGI are trivial problems."1

In "I Still Don’t Get Foom", Robin Hansoncontends:

Yes, sometimes architectural choices havewider impacts. But I was an artificial in-telligence researcher for nine years, endingtwenty years ago, and I never saw an ar-chitecture choice make a huge difference,relative to other reasonable architecturechoices. For most big systems, overall ar-chitecture matters a lot less than gettinglots of detail right.

This suggests that it’s unlikely that a single in-sight will make an astronomical difference toan AI’s performance.Similarly, my experience is that machine-

learning algorithms matter less than the datathey’re trained on. I think this is a generalsentiment among data scientists. There’s a fa-mous slogan that "More data is better data."A main reason Google’s performance is so

1Stuart Armstrong agrees that AIXI probably isn’t a feasible approach to AGI, but he feels there might existother, currently undiscovered mathematical insights like AIXI that could yield AGI in a very short time span.Maybe, though I think this is pretty unlikely. I suppose at least a few people should explore these scenarios, butplausibly most of the work should go toward pushing on the more likely outcomes.

7

Page 8: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

good is that it has so many users that even ob-scure searches, spelling mistakes, etc. will ap-pear somewhere in its logs. But if many perfor-mance gains come from data, then they’re con-strained by hardware, which generally growssteadily.Hanson’s "I Still Don’t Get Foom" post con-

tinues: "To be much better at learning, theproject would instead have to be much bet-ter at hundreds of specific kinds of learning.Which is very hard to do in a small project."Anders Sandberg makes a similar point:

As the amount of knowledge grows, it be-comes harder and harder to keep up and toget an overview, necessitating specializa-tion. [...] This means that a developmentproject might need specialists in many ar-eas, which in turns means that there is alower size of a group able to do the devel-opment. In turn, this means that it is veryhard for a small group to get far aheadof everybody else in all areas, simply be-cause it will not have the necessary knowhow in all necessary areas. The solution isof course to hire it, but that will enlargethe group.

One of the more convincing anti-"foom" ar-guments is J. Storrs Hall’s (2008) point thatan AI improving itself to a world superpowerwould need to outpace the entire world econ-omy of 7 billion people, plus natural resourcesand physical capital. It would do much betterto specialize, sell its services on the market,and acquire power/wealth in the ways thatmost people do. There are plenty of power-hungry people in the world, but usually theygo to Wall Street, K Street, or Silicon Valleyrather than trying to build world-dominationplans in their basement. Why would an AI bedifferent? Some possibilities:1. By being built differently, it’s able to con-

coct an effective world-domination strat-egy that no human has thought of.

2. Its non-human form allows it to diffusethroughout the Internet and make copiesof itself.

I’m skeptical of #1, though I suppose if theAI is very alien, these kinds of unknown un-knowns become more plausible. #2 is an inter-esting point. It seems like a pretty good way tospread yourself as an AI is to become a usefulsoftware product that lots of people want toinstall, i.e., to sell your services on the worldmarket, as Hall said. Of course, once that’sdone, perhaps the AI could find a way to takeover the world. Maybe it could silently quashcompetitor AI projects. Maybe it could hackinto computers worldwide via the Internet andInternet of Things, as the AI did in the Deleteseries. Maybe it could devise a way to convincehumans to give it access to sensitive controlsystems, as Skynet did in Terminator 3.I find these kinds of scenarios for AI takeover

more plausible than a rapidly self-improvingsuperintelligence. Indeed, even a human-levelintelligence that can distribute copies of itselfover the Internet might be able to take con-trol of human infrastructure and hence takeover the world. No "foom" is required.Rather than discussing hard-vs.-soft take-

off arguments more here, I added discussionto Wikipedia where the content will receivegreater readership. See "Hard vs. soft takeoff"in "Recursive self-improvement".The hard vs. soft distinction is obviously a

matter of degree. And maybe how long theprocess takes isn’t the most relevant way toslice the space of scenarios. For practical pur-poses, the more relevant question is: Shouldwe expect control of AI outcomes to reside pri-marily in the hands of a few "seed AI" develop-ers? In this case, altruists should focus on in-fluencing a core group of AI experts, or maybetheir military/corporate leaders. Or should weexpect that society as a whole will play a bigrole in shaping how AI is developed and used?In this case, governance structures, social dy-

8

Page 9: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

namics, and non-technical thinkers will playan important role not just in influencing howmuch AI research happens but also in how thetechnologies are deployed and incrementallyshaped as they mature.It’s possible that one country – perhaps

the United States, or maybe China in laterdecades – will lead the way in AI development,especially if the research becomes national-ized when AI technology grows more power-ful. Would this country then take over theworld? I’m not sure. The United States hada monopoly on nuclear weapons for severalyears after 1945, but it didn’t bomb the So-viet Union out of existence. A country with amonopoly on artificial superintelligence mightrefrain from destroying its competitors as well.On the other hand, AI should enable vastlymore sophisticated surveillance and controlthan was possible in the 1940s, so a monopolymight be sustainable even without resortingto drastic measures. In any case, perhaps acountry with superintelligence would just eco-nomically outcompete the rest of the world,rendering military power superfluous.Besides a single country taking over the

world, the other possibility (perhaps morelikely) is that AI is developed in a dis-tributed fashion, either openly as is the casein academia today, or in secret by governmentsas is the case with other weapons of mass de-struction.Even in a soft-takeoff case, there would come

a point at which humans would be unable tokeep up with the pace of AI thinking. (We al-ready see an instance of this with algorith-mic stock-trading systems, although humantraders are still needed for more complex tasksright now.) The reins of power would haveto be transitioned to faster human uploads,

trusted AIs built from scratch, or some com-bination of the two. In a slow scenario, theremight be many intelligent systems at compa-rable levels of performance, maintaining a bal-ance of power, at least for a while.2 In the longrun, a singleton (Bostrom, 2006) seems plausi-ble because computers – unlike human kings –can reprogram their servants to want to obeytheir bidding, which means that as an agentgains more central authority, it’s not likely tolater lose it by internal rebellion (only by ex-ternal aggression).

Most of humanity’s problems are fundamen-tally coordination problems / selfishness prob-lems. If humans were perfectly altruistic, wecould easily eliminate poverty, overpopulation,war, arms races, and other social ills. Therewould remain "man vs. nature" problems, butthese are increasingly disappearing as tech-nology advances. Assuming a digital singletonemerges, the chances of it going extinct seemvery small (except due to alien invasions orother external factors) because unless the sin-gleton has a very myopic utility function, itshould consider carefully all the consequencesof its actions – in contrast to the "fools rushin" approach that humanity currently takestoward most technological risks, due to want-ing the benefits of and profits from technol-ogy right away and not wanting to lose outto competitors. For this reason, I suspect thatmost of George Dvorsky’s "12 Ways Human-ity Could Destroy The Entire Solar System"are unlikely to happen, since most of thempresuppose blundering by an advanced Earth-originating intelligence, but probably by thetime Earth-originating intelligence would beable to carry out interplanetary engineering ona nontrivial scale, we’ll already have a digitalsingleton that thoroughly explores the risks of

2Marcus Hutter imagines a society of AIs that compete for computing resources in a similar way as animalscompete for food and space. Or like corporations compete for employees and market share. He suggests that suchcompetition might render initial conditions irrelevant. Maybe, but it’s also quite plausible that initial conditionswould matter a lot. Many evolutionary pathways depended sensitively on particular events – e.g., asteroid impacts– and the same is true for national, corporate, and memetic power.

9

Page 10: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

its actions before executing them. That said,this might not be true if competing AIs be-gin astroengineering before a singleton is com-pletely formed. (By the way, I should pointout that I prefer it if the cosmos isn’t success-fully colonized, because doing so is likely toastronomically multiply sentience and there-fore suffering.)

7 Intelligence explosion?

Sometimes it’s claimed that we should ex-pect a hard takeoff because AI-developmentdynamics will fundamentally change once AIscan start improving themselves. One stylizedway to explain this is via differential equations.Let I(t) be the intelligence of AIs at time t.• While humans are building AIs, we have,

dI/dt=c, where c is some constant levelof human engineering ability. This im-plies I(t)=ct+constant, a linear growth ofI with time.

• In contrast, once AIs can design them-selves, we’ll have dI/dt=kI for some k.That is, the rate of growth will be faster asthe AI designers become more intelligent.This implies I(t)=Aet for some constantA.

Luke Muehlhauser reports that the idea of in-telligence explosion once machines can startimproving themselves "ran me over like atrain. Not because it was absurd, but becauseit was clearly true." I think this kind of expo-nential feedback loop is the basis behind manyof the intelligence-explosion arguments.But let’s think about this more carefully.

What’s so special about the point where ma-chines can understand and modify themselves?Certainly understanding your own source codehelps you improve yourself. But humans al-ready understand the source code of present-day AIs with an eye toward improving it.Moreover, present-day AIs are vastly simplerthan human-level ones will be, and present-day AIs are far less intelligent than the hu-

mans who create them. Which is easier: (1) im-proving the intelligence of something as smartas you, or (2) improving the intelligence ofsomething far dumber? (2) is usually easier.So if anything, AI intelligence should be "ex-ploding" faster now, because it can be liftedup by something vastly smarter than it. OnceAIs need to improve themselves, they’ll haveto pull up on their own bootstraps, withoutthe guidance of an already existing model offar superior intelligence on which to base theirdesigns.As an analogy, it’s harder to produce novel

developments if you’re the market-leadingcompany; it’s easier if you’re a competitor try-ing to catch up, because you know what toaim for and what kinds of designs to reverse-engineer. AI right now is like a competitor try-ing to catch up to the market leader.Another way to say this: The constants in

the differential equations might be important.Even if human AI-development progress is lin-ear, that progress might be faster than a slowexponential curve until some point far laterwhere the exponential catches up.In any case, I’m cautious of simple differen-

tial equations like these. Why should the rateof intelligence increase be proportional to theintelligence level? Maybe the problems becomemuch harder at some point. Maybe the sys-tems become fiendishly complicated, such thateven small improvements take a long time.Robin Hanson echoes this suggestion:

Students get smarter as they learn more,and learn how to learn. However, we teachthe most valuable concepts first, and theproductivity value of schooling eventuallyfalls off, instead of exploding to infinity.Similarly, the productivity improvement offactory workers typically slows with time,following a power law.

At the world level, average IQ scoreshave increased dramatically over the lastcentury (the Flynn effect), as the world

10

Page 11: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

has learned better ways to think and toteach. Nevertheless, IQs have improvedsteadily, instead of accelerating. Similarly,for decades computer and communicationaids have made engineers much "smarter,"without accelerating Moore’s law. Whileengineers got smarter, their design tasksgot harder.

Also, ask yourself this question: Why do star-tups exist? Part of the answer is that they caninnovate faster than big companies due to hav-ing less institutional baggage and legacy soft-ware.3 It’s harder to make radical changes tobig systems than small systems. Of course, likethe economy does, a self-improving AI couldcreate its own virtual startups to experimentwith more radical changes, but just as in theeconomy, it might take a while to prove newconcepts and then transition old systems tothe new and better models.In discussions of intelligence explosion, it’s

common to approximate AI productivity asscaling linearly with number of machines, butthis may or may not be true depending on thedegree of parallelizability. Empirical examplesfor human-engineered projects show diminish-ing returns with more workers, and while com-puters may be better able to partition workdue to greater uniformity and speed of com-munication, there will remain some overheadin parallelization. Some tasks may be inher-ently non-paralellizable, preventing the kindsof ever-faster performance that the most ex-treme explosion scenarios envisage.Fred Brooks’s (1995) "No Silver Bullet" pa-

per argued that "there is no single develop-ment, in either technology or managementtechnique, which by itself promises even oneorder of magnitude improvement within adecade in productivity, in reliability, in sim-plicity." Likewise, Wirth’s law reminds us ofhow fast software complexity can grow. These

points make it seem less plausible that an AIsystem could rapidly bootstrap itself to su-perintelligence using just a few key as-yet-undiscovered insights.Eventually there has to be a leveling off of in-

telligence increase if only due to physical lim-its. On the other hand, one argument in fa-vor of differential equations is that the econ-omy has fairly consistently followed exponen-tial trends since humans evolved, though theexponential growth rate of today’s economyremains small relative to what we typicallyimagine from an "intelligence explosion".I think a stronger case for intelligence explo-

sion is the clock-speed difference between bio-logical and digital minds (Sotala, 2012). Evenif AI development becomes very slow in sub-jective years, once AIs take it over, in objec-tive years (i.e., revolutions around the sun),the pace will continue to look blazingly fast.But if enough of society is digital by thatpoint (including human-inspired subroutinesand maybe full digital humans), then digi-tal speedup won’t give a unique advantage toa single AI project that can then take overthe world. Hence, hard takeoff in the sci fisense still isn’t guaranteed. Also, Hanson ar-gues that faster minds would produce a one-time jump in economic output but not neces-sarily a sustained higher rate of growth.Another case for intelligence explosion is

that intelligence growth might not be drivenby the intelligence of a given agent so muchas by the collective man-hours (or machine-hours) that would become possible with moreresources. I suspect that AI research could ac-celerate at least 10 times if it had 10-50 timesmore funding. (This is not the same as sayingI want funding increased; in fact, I probablywant funding decreased to give society moretime to sort through these issues.) The popu-lation of digital minds that could be createdin a few decades might exceed the biological

3Another part of the answer has to do with incentive structures – e.g., a founder has more incentive to makea company succeed if she’s mainly paid in equity than if she’s paid large salaries along the way.

11

Page 12: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

human population, which would imply fasterprogress if only by numerosity. Also, the digi-tal minds might not need to sleep, would focusintently on their assigned tasks, etc. However,once again, these are advantages in objectivetime rather than collective subjective time.And these advantages would not be uniquelyavailable to a single first-mover AI project;any wealthy and technologically sophisticatedgroup that wasn’t too far behind the cuttingedge could amplify its AI development in thisway.(A few weeks after writing this section, I

learned that Ch. 4 of Nick Bostrom’s (2014)Superintelligence: Paths, Dangers, Strategiescontains surprisingly similar content, even upto the use of dI/dt as the symbols in a differen-tial equation. However, Bostrom comes downmostly in favor of the likelihood of an intel-ligence explosion. I reply to Bostrom’s argu-ments in the next section.)

8 Reply to Bostrom’s arguments for ahard takeoff

In Ch. 4 of Superintelligence, Bostrom sug-gests several factors that might lead to a hardor at least semi-hard takeoff. I don’t fully dis-agree with his points, and because these aredifficult issues, I agree that Bostrom mightbe right. But I want to play devil’s advocateand defend the soft-takeoff view. I’ve distilledand paraphrased what I think are 6 core argu-ments, and I reply to each in turn.#1: There might be a key missing algorith-

mic insight that allows for dramatic progress.Maybe, but do we have much precedent for

this? As far as I’m aware, all individual AIadvances – and indeed, most technology ad-vances in general – have not represented as-tronomical improvements over previous de-signs. Maybe connectionist AI systems rep-resented a game-changing improvement rela-tive to symbolic AI for messy tasks like vision,but I’m not sure how much of an improve-

ment they represented relative to the best al-ternative technologies. After all, neural net-works are in some sense just fancier forms ofpre-existing statistical methods like logistic re-gression. And even neural networks came instages, with the perceptron, multi-layer net-works, backpropagation, recurrent networks,deep networks, etc. The most groundbreakingmachine-learning advances may reduce errorrates by a half or something, which may becommercially very important, but this is notmany orders of magnitude as hard-takeoff sce-narios tend to assume.Outside of AI, the Internet changed the

world, but it was an accumulation of many in-sights. Facebook has had massive impact, butit too was built from many small parts andgrew in importance slowly as its size increased.Microsoft became a virtual monopoly in the1990s but perhaps more for business than tech-nology reasons, and its power in the softwareindustry at large is probably not growing.Google has a quasi-monopoly on web search,kicked off by the success of PageRank, butmost of its improvements have been small andgradual. Google has grown very powerful, butit hasn’t maintained a permanent advantagethat would allow it to take over the softwareindustry.Acquiring nuclear weapons might be the

closest example of a single discrete step thatmost dramatically changes a country’s posi-tion, but this may be an outlier. Maybe otheradvances in weaponry (arrows, guns, etc.) his-torically have had somewhat dramatic effects.Bostrom doesn’t present specific arguments

for thinking that a few crucial insights mayproduce radical jumps. He suggests that wemight not notice a system’s improvementsuntil it passes a threshold, but this seemsabsurd, because at least the AI developerswould need to be intimately acquainted withthe AI’s performance. While not strictly ac-curate, there’s a slogan: "You can’t improvewhat you can’t measure." Maybe the AI’s

12

Page 13: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

progress wouldn’t make world headlines, butthe academic/industrial community would bewell aware of nontrivial breakthroughs, andthe AI developers would live and breathe per-formance numbers.#2: Once an AI passes a threshold, it might

be able to absorb vastly more content (e.g., byreading the Internet) that was previously inac-cessible.Absent other concurrent improvements I’m

doubtful this would produce take-over-the-world superintelligence, because the world’scurrent superintelligence (namely, humanityas a whole) already has read most of theInternet – indeed, has written it. I guesshumans haven’t read automatically gener-ated text/numerical context, but the insightsgleaned purely from reading such materialwould be low without doing more sophisti-cated data mining and learning on top of it,and presumably such data mining would havealready been in progress well before Bostrom’shypothetical AI learned how to read.In any case, I doubt reading with under-

standing is such an all-or-nothing activity thatit can suddenly "turn on" once the AI achievesa certain ability level. As Bostrom says (p.71), reading with the comprehension of a 10-year-old is probably AI-complete, i.e., requiressolving the general AI problem. So assumingthat you can switch on reading ability with oneimprovement is equivalent to assuming that asingle insight can produce astronomical gainsin AI performance, which we discussed above.If that’s not true, and if before the AI sys-tem with 10-year-old reading ability was anAI system with a 6-year-old reading ability,why wouldn’t that AI have already devouredthe Internet? And before that, why wouldn’t aproto-reader have devoured a version of the In-ternet that had been processed to make it eas-ier for a machine to understand? And so on,until we get to the present-day TextRunnersystem that Bostrom cites, which is alreadydevouring the Internet. It doesn’t make sense

that massive amounts of content would only beadded after lots of improvements. Commercialincentives tend to yield exactly the oppositeeffect: converting the system to a large-scaleproduct when even modest gains appear, be-cause these may be enough to snatch a marketadvantage.The fundamental point is that I don’t think

there’s a crucial set of components to gen-eral intelligence that all need to be in placebefore the whole thing works. It’s hard toevolve systems that require all components tobe in place at once, which suggests that humangeneral intelligence probably evolved gradu-ally. I expect it’s possible to get partial AGIwith partial implementations of the compo-nents of general intelligence, and the compo-nents can gradually be made more general overtime. Components that are lacking can be sup-plemented by human-based computation andnarrow-AI hacks until more general solutionsare discovered. Compare with minimum viableproducts and agile software development. Asa result, society should be upended by partialAGI innovations many times over the comingdecades, well before fully human-level AGI isfinished.#3: Once a system "proves its mettle by at-

taining human-level intelligence", funding forhardware could multiply.I agree that funding for AI could multiply

manyfold due to a sudden change in popu-lar attention or political dynamics. But I’mthinking of something like a factor of 10 ormaybe 50 in an all-out Cold War-style armsrace. A factor-of-50 boost in hardware isn’tobviously that important. If before there wasone human-level AI, there would now be 50.In any case, I expect the Sputnik moment(s)for AI to happen well before it achieves a hu-man level of ability. Companies and militariesaren’t stupid enough not to invest massivelyin an AI with almost-human intelligence.#4: Once the human level of intelligence is

reached, "Researchers may work harder, [and]

13

Page 14: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

more researchers may be recruited".As with hardware above, I would expect

these "shit hits the fan" moments to happenbefore fully human-level AI. In any case:

• It’s not clear there would be enough AIspecialists to recruit in a short time.Other quantitatively minded people couldswitch to AI work, but they would pre-sumably need years of experience to pro-duce cutting-edge insights.

• The number of people thinking aboutAI safety, ethics, and social implicationsshould also multiply during Sputnik mo-ments. So the ratio of AI policy work tototal AI work might not change relative toslower takeoffs, even if the physical timescales would compress.

#5: At some point, the AI’s self-improvementswould dominate those of human engineers,leading to exponential growth.I discussed this in the "Intelligence explo-

sion?" section above. A main point is thatwe see many other systems, such as theworld economy or Moore’s law, that also ex-hibit positive feedback and hence exponen-tial growth, but these aren’t "fooming" atan astounding rate. It’s not clear why anAI’s self-improvement – which resembles eco-nomic growth and other complex phenomena– should suddenly explode faster (in subjec-tive time) than humanity’s existing recursive-self improvement of its intelligence via digitalcomputation.On the other hand, maybe the difference

between subjective and objective time is im-portant. If a human-level AI could think, say,10,000 times faster than a human, then assum-ing linear scaling, it would be worth 10,000 en-gineers. By the time of human-level AI, I ex-pect there would be far more than 10,000 AIdevelopers on Earth, but given enough hard-ware, the AI could copy itself manyfold untilits subjective time far exceeded that of humanexperts. The speed and copiability advantages

of digital minds seem perhaps the strongest ar-guments for a takeoff that happens rapidly rel-ative to human observers. Note that, as Han-son said above, this digital speedup might bejust a one-time boost, rather than a perma-nently higher rate of growth, but even theone-time boost could be enough to radicallyalter the power dynamics of humans vis-à-vismachines. That said, there should be plentyof slightly sub-human AIs by this time, andmaybe they could fill some speed gaps on be-half of biological humans.In general, it’s a mistake to imagine human-

level AI against a backdrop of our currentworld. That’s like imagining a Tyrannosaurusrex in a human city. Rather, the world willlook very different by the time human-levelAI arrives. Many of the intermediate steps onthe path to general AI will be commerciallyuseful and thus should diffuse widely in themeanwhile. As user "HungryHobo" noted: "Ifyou had a near human level AI, odds are, ev-erything that could be programmed into itat the start to help it with software develop-ment is already going to be part of the suitesof tools for helping normal human program-mers." Even if AI research becomes nation-alized and confidential, its developers shouldstill have access to almost-human-level digital-speed AI tools, which should help smooth thetransition. For instance, Bostrom (2014) men-tions how in the 2010 flash crash (Box 2, p.17), a high-speed positive-feedback spiral wasterminated by a high-speed "circuit breaker".This is already an example where problemshappening faster than humans could compre-hend them were averted due to solutions hap-pening faster than humans could comprehendthem. See also the discussion of "tripwires" inSuperintelligence (Bostrom, 2014, p. 137).Conversely, many globally disruptive events

may happen well before fully human AI ar-rives, since even sub-human AI may be prodi-giously powerful.#6: "even when the outside world has a

14

Page 15: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

greater total amount of relevant research ca-pability than any one project", the optimiza-tion power of the project might be more im-portant than that of the world "since much ofthe outside world’s capability is not be focusedon the particular system in question". Hence,the project might take off and leave the worldbehind. (Box 4, p. 75)What one makes of this argument depends

on how many people are needed to engineerhow much progress. The Watson system thatplayed on Jeopardy! required 15 people over∼ 4(?) years4 – given the existing tools ofthe rest of the world at that time, which hadbeen developed by millions (indeed, billions)of other people. Watson was a much smallerleap forward than that needed to give a generalintelligence a take-over-the-world advantage.How many more people would be required toachieve such a radical leap in intelligence? Thisseems to be a main point of contention in thedebate between believers in soft vs. hard take-off. The Manhattan Project required 100,000scientists, and atomic bombs seem much easierto invent than general AI.

9 How complex is the brain?

Can we get insight into how hard general intel-ligence is based on neuroscience? Is the humanbrain fundamentally simple or complex?

9.1 One basic algorithm?

Jeff Hawkins, Andrew Ng, and others specu-late that the brain may have one fundamen-tal algorithm for intelligence – deep learningin the cortical column. This idea gains plausi-bility from the brain’s plasticity. For instance,blind people can appropriate the visual cortexfor auditory processing. Artificial neural net-works can be used to classify any kind of input– not just visual and auditory but even highlyabstract, like features about credit-card fraudor stock prices.

Maybe there’s one fundamental algorithmfor input classification, but this doesn’t im-ply one algorithm for all that the brain does.Beyond the cortical column, the brain hasmany specialized structures that seem to per-form very specialized functions, such as rewardlearning in the basal ganglia, fear processing inthe amygdala, etc. Of course, it’s not clear howessential all of these parts are or how easy itwould be to replace them with artificial com-ponents performing the same basic functions.One argument for faster AGI takeoffs is that

humans have been able to learn many sophisti-cated things (e.g., advanced mathematics, mu-sic, writing, programming) without requiringany genetic changes. And what we now knowdoesn’t seem to represent any kind of limit towhat we could know with more learning. Thehuman collection of cognitive algorithms isvery flexible, which seems to belie claims thatall intelligence requires specialized designs. Onthe other hand, even if human genes haven’tchanged much in the last 10,000 years, hu-man culture has evolved substantially, and cul-ture undergoes slow trial-and-error evolutionin similar ways as genes do. So one could ar-gue that human intellectual achievements arenot fully general but rely on a vast amount ofspecialized, evolved content. Just as a singlerandom human isolated from society probablycouldn’t develop general relativity on his ownin a lifetime, so a single random human-levelAGI probably couldn’t either. Culture is thenew genome, and it progresses slowly.Moreover, some scholars believe that certain

human abilities, such as language, are very es-sentially based on genetic hard-wiring:

The approach taken by Chomsky andMarr toward understanding how ourminds achieve what they do is as differ-ent as can be from behaviorism. The em-phasis here is on the internal structure ofthe system that enables it to perform a

4Or maybe more? Nikola Danaylov reports rumored estimates of $50-150 million for Watson’s R&D.

15

Page 16: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

task, rather than on external associationbetween past behavior of the system andthe environment. The goal is to dig intothe "black box" that drives the system anddescribe its inner workings, much like howa computer scientist would explain how acleverly designed piece of software worksand how it can be executed on a desktopcomputer.

Chomsky himself notes:

There’s a fairly recent book by a verygood cognitive neuroscientist, Randy Gal-listel and King, arguing – in my view,plausibly – that neuroscience developedkind of enthralled to associationism andrelated views of the way humans and an-imals work. And as a result they’ve beenlooking for things that have the propertiesof associationist psychology.

[...] Gallistel has been arguing for yearsthat if you want to study the brain prop-erly you should begin, kind of like Marr,by asking what tasks is it performing. Sohe’s mostly interested in insects. So if youwant to study, say, the neurology of an ant,you ask what does the ant do? It turns outthe ants do pretty complicated things, likepath integration, for example. If you lookat bees, bee navigation involves quite com-plicated computations, involving positionof the sun, and so on and so forth. But ingeneral what he argues is that if you takea look at animal cognition, human too, it’scomputational systems.

Many parts of the human body, like the diges-tive system or bones/muscles, are extremelycomplex and fine-tuned, yet few people arguethat their development is controlled by learn-ing. So it’s not implausible that a lot of thebrain’s basic architecture could be similarlyhard-coded.Typically AGI researchers express scorn for

manually tuned software algorithms that don’t

rely on fully general learning. But Chomsky’sstance challenges that sentiment. If Chomskyis right, then a good portion of human "generalintelligence" is finely tuned, hard-coded soft-ware of the sort that we see in non-AI branchesof software engineering. And this view wouldsuggest a slower AGI takeoff because time andexperimentation are required to tune all thedetailed, specific algorithms of intelligence.

9.2 Ontogenetic development

A full-fledged superintelligence probably re-quires very complex design, but it may be pos-sible to build a "seed AI" that would recur-sively self-improve toward superintelligence.Turing (1950) proposed this in "Computingmachinery and intelligence":

Instead of trying to produce a programmeto simulate the adult mind, why not rathertry to produce one which simulates thechild’s? If this were then subjected toan appropriate course of education onewould obtain the adult brain. Presumablythe child brain is something like a note-book as one buys it from the stationer’s.Rather little mechanism, and lots of blanksheets. (Mechanism and writing are fromour point of view almost synonymous.)Our hope is that there is so little mech-anism in the child brain that somethinglike it can be easily programmed.

Animal development appears to be at leastsomewhat robust based on the fact that thegrowing organisms are often functional despitea few genetic mutations and variations in pre-natal and postnatal environments. Such vari-ations may indeed make an impact – e.g.,healthier development conditions tend to yieldmore physically attractive adults – but mosthumans mature successfully over a wide rangeof input conditions.On the other hand, an argument against the

simplicity of development is the immense com-plexity of our DNA. It accumulated over bil-

16

Page 17: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

lions of years through vast numbers of evolu-tionary "experiments". It’s not clear that hu-man engineers could perform enough measure-ments to tune ontogenetic parameters of a seedAI in a short period of time. And even if theparameter settings worked for early develop-ment, they would probably fail for later de-velopment. Rather than a seed AI developinginto an "adult" all at once, designers woulddevelop the AI in small steps, since each nextstage of development would require significanttuning to get right.Think about how much effort is required for

human engineers to build even relatively sim-ple systems. For example, I think the num-ber of developers who work on Microsoft Of-fice is in the thousands. Microsoft Office iscomplex but is still far simpler than a mam-malian brain. Brains have lots of little partsthat have been fine-tuned. That kind of com-plexity requires immense work by software de-velopers to create. The main counterargumentis that there may be a simple meta-algorithmthat would allow an AI to bootstrap to thepoint where it could fine-tune all the details onits own, without requiring human inputs. Thismight be the case, but my guess is that any el-egant solution would be hugely expensive com-putationally. For instance, biological evolutionwas able to fine-tune the human brain, but itdid so with immense amounts of computingpower over millions of years.

10 Brain quantity vs. quality

A common analogy for the gulf between su-perintelligence vs. humans is that between hu-mans vs. chimpanzees. In Consciousness Ex-plained, Daniel Dennett (1992, pp.189-190)mentions how our hominid ancestors hadbrains roughly four times the volume as thoseof chimps but roughly the same in struc-ture. This might incline one to imagine thatbrain size alone could yield superintelligence.Maybe we’d just need to quadruple human

brains once again to produce superintelligenthumans? If so, wouldn’t this imply a hardtakeoff, since quadrupling hardware is rela-tively easy?But in fact, as Dennett explains, the quadru-

pling of brain size from chimps to pre-humanscompleted before the advent of language,cooking, agriculture, etc. In other words, themain "foom" of humans came from culturerather than brain size per se – from softwarein addition to hardware. Yudkowsky (2013)seems to agree: "Humans have around fourtimes the brain volume of chimpanzees, butthe difference between us is probably mostlybrain-level cognitive algorithms."But cultural changes (software) arguably

progress a lot more slowly than hardware. Theintelligence of human society has grown ex-ponentially, but it’s a slow exponential, andrarely have there been innovations that al-lowed one group to quickly overpower everyoneelse within the same region of the world. (Be-tween isolated regions of the world the situa-tion was sometimes different – e.g., Europeanswith Maxim guns overpowering Africans be-cause of very different levels of industrializa-tion.)

11 More impact in hard-takeoff scenar-ios?

Some, including Owen Cotton-Barratt andToby Ord, have argued that even if we thinksoft takeoffs are more likely, there may behigher value in focusing on hard-takeoff sce-narios because these are the cases in whichsociety would have the least forewarning andthe fewest people working on AI altruism is-sues. This is a reasonable point, but I wouldadd that• Maybe hard takeoffs are sufficiently im-

probable that focusing on them stilldoesn’t have highest priority. (Of course,some exploration of fringe scenarios isworthwhile.) There may be important ad-

17

Page 18: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

vantages to starting early in shaping howsociety approaches soft takeoffs, and if asoft takeoff is very likely, those efforts mayhave more expected impact.

• Thinking about the most likely AI out-comes rather than the most impactfuloutcomes also gives us a better platformon which to contemplate other levers forshaping the future, such as non-AI emerg-ing technologies, international relations,governance structures, values, etc. Focus-ing on a tail AI scenario doesn’t informnon-AI work very well because that sce-nario probably won’t happen. Promotingantispeciesism matters whether there’s ahard or soft takeoff (indeed, maybe morein the soft-takeoff case), so our model ofhow the future will unfold should gener-ally focus on likely scenarios.

In any case, the hard-soft distinction is notbinary, and maybe the best place to focus ison scenarios where human-level AI takes overon a time scale of a few years. (Timescales ofmonths, days, or hours strike me as pretty im-probable, unless, say, Skynet gets control ofnuclear weapons.)

In Superintelligence, Nick Bostrom (2014)suggests (Ch. 4, p. 64) that "Most prepara-tions undertaken before onset of [a] slow take-off would be rendered obsolete as better so-lutions would gradually become visible in thelight of the dawning era." Toby Ord uses theterm "nearsightedness" to refer to the ways inwhich research too far in advance of an issue’semergence may not as useful as research whenmore is known about the issue. Ord contraststhis with benefits of starting early, includingcourse-setting. I think Ord’s counterpoints ar-gue against the contention that early workwouldn’t matter that much in a slow take-off. Some of how society responded to AI sur-passing human intelligence might depend onearly frameworks and memes. (For instance,consider the lingering impact of Terminator

imagery on almost any present-day popular-media discussion of AI risk.) Some fundamen-tal work would probably not be overthrownby later discoveries; for instance, algorithmic-complexity bounds of key algorithms were dis-covered decades ago but will remain relevantuntil intelligence dies out, possibly billionsof years from now. Some non-technical pol-icy and philosophy work would be less ob-soleted by changing developments. And someAI preparation would be relevant both in theshort term and the long term. Slow AI takeoffto reach the human level is already happen-ing, and more minds should be exploring thesequestions well in advance.Making a related though slightly different

point, Bostrom (2014) argues in Superintel-ligence (Ch. 5, pp. 85-86) that individualsmight play more of a role in cases whereelites and governments underestimate the sig-nificance of AI: "Activists seeking maximumexpected impact may therefore wish to focusmost of their planning on [scenarios where gov-ernments come late to the game], even if theybelieve that scenarios in which big players endup calling all the shots are more probable."Again I would qualify this with the note thatwe shouldn’t confuse "acting as if" govern-ments will come late with believing they actu-ally will come late when thinking about mostlikely future scenarios.Even if one does wish to bet on low-

probability, high-impact scenarios of fast take-off and governmental neglect, this doesn’tspeak to whether or how we should pushon takeoff speed and governmental attentionthemselves. Following are a few considera-tions.Takeoff speed

• In favor of fast takeoff:– A singleton is more likely, thereby

averting possibly disastrous conflictamong AIs.

– If one prefers uncontrolled AI, fast

18

Page 19: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

takeoffs seem more likely to producethem.

• In favor of slow takeoff:– More time for many parties to partic-

ipate in shaping the process, compro-mising, and developing less damagingpathways to AI takeoff.

– If one prefers controlled AI, slow take-offs seem more likely to produce themin general. (There are some excep-tions. For instance, fast takeoff of anAI built by a very careful group mightremain more controlled than an AIbuilt by committees and messy pol-itics.)

Amount of government/popular attention to AI

• In favor of more:– Would yield much more reflection,

discussion, negotiation, and pluralis-tic representation.

– If one favors controlled AI, it’s plau-sible that multiplying the number ofpeople thinking about AI would mul-tiply consideration of failure modes.

– Public pressure might help curb armsraces, in analogy with public opposi-tion to nuclear arms races.

• In favor of less:– Wider attention to AI might accel-

erate arms races rather than induc-ing cooperation on more circumspectplanning.

– The public might freak out and de-mand counterproductive measures inresponse to the threat.

– If one prefers uncontrolled AI, thatoutcome may be less likely with manymore human eyes scrutinizing the is-sue.

12 Village idiot vs. Einstein

One of the strongest arguments for hard take-off is this one by Yudkowsky:

the distance from "village idiot" to "Ein-stein" is tiny, in the space of brain designs.

Or as Scott Alexander put it:

It took evolution twenty million years togo from cows with sharp horns to hominidswith sharp spears; it took only a few tensof thousands of years to go from hominidswith sharp spears to moderns with nuclearweapons.

I think we shouldn’t take relative evolution-ary timelines at face value, because most ofthe previous 20 million years of mammalianevolution weren’t focused on improving humanintelligence; most of the evolutionary selectionpressure was directed toward optimizing othertraits. In contrast, cultural evolution placesgreater emphasis on intelligence because thattrait is more important in human society thanit is in most animal fitness landscapes.Still, the overall point is important: The

tweaks to a brain needed to produce human-level intelligence may not be huge comparedwith the designs needed to produce chimp in-telligence, but the differences in the behav-iors of the two systems, when placed in asufficiently information-rich environment, arehuge.Nonetheless, I incline toward thinking that

the transition from human-level AI to anAI significantly smarter than all of humanitycombined would be somewhat gradual (requir-ing at least years if not decades) because theabsolute scale of improvements needed wouldstill be immense and would be limited by hard-ware capacity. But if hardware becomes manyorders of magnitude more efficient than it istoday, then things could indeed move morerapidly.

19

Page 20: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

13 A case for epistemic modesty on AItimelines

Estimating how long a software project willtake to complete is notoriously difficult. Evenif I’ve completed many similar coding tasks be-fore, when I’m asked to estimate the time tocomplete a new coding project, my estimateis often wrong by a factor of 2 and sometimeswrong by a factor of 4, or even 10. Insofar asthe development of AGI (or other big tech-nologies, like nuclear fusion) is a big software(or more generally, engineering) project, it’sunsurprising that we’d see similarly dramaticfailures of estimation on timelines for thesebigger-scale achievements.A corollary is that we should maintain some

modesty about AGI timelines and takeoffspeeds. If, say, 100 years is your median es-timate for the time until some agreed-uponform of AGI, then there’s a reasonable chanceyou’ll be off by a factor of 2 (suggesting AGIwithin 50 to 200 years), and you might evenbe off by a factor of 4 (suggesting AGI within25 to 400 years). Similar modesty applies forestimates of takeoff speed from human-levelAGI to super-human AGI, although I thinkwe can largely rule out extreme takeoff speeds(like achieving performance far beyond humanabilities within hours or days) based on fun-damental reasoning about the computationalcomplexity of what’s required to achieve su-perintelligence.My bias is generally to assume that a given

technology will take longer to develop thanwhat you hear about in the media, (a) be-cause of the planning fallacy and (b) becausethose who make more audacious claims aremore interesting to report about. Believers in"the singularity" are not necessarily wrongabout what’s technically possible in the longterm (though sometimes they are), but thereason enthusiastic singularitarians are consid-ered "crazy" by more mainstream observersis that singularitarians expect change much

faster than is realistic. AI turned out to bemuch harder than the Dartmouth Conferenceparticipants expected. Likewise, nanotech isprogressing slower and more incrementallythan the starry-eyed proponents predicted.

14 Intelligent robots in your backyard

Many nature-lovers are charmed by the be-havior of animals but find computers androbots to be cold and mechanical. Conversely,some computer enthusiasts may find biologyto be soft and boring compared with digi-tal creations. However, the two domains sharea surprising amount of overlap. Ideas of op-timal control, locomotion kinematics, visualprocessing, system regulation, foraging behav-ior, planning, reinforcement learning, etc. havebeen fruitfully shared between biology androbotics. Neuroscientists sometimes look tothe latest developments in AI to guide theirtheoretical models, and AI researchers are of-ten inspired by neuroscience, such as with neu-ral networks and in deciding what cognitivefunctionality to implement.I think it’s helpful to see animals as be-

ing intelligent robots. Organic life has a widediversity, from unicellular organisms throughhumans and potentially beyond, and so toocan robotic life. The rigid conceptual bound-ary that many people maintain between "life"and "machines" is not warranted by the un-derlying science of how the two types of sys-tems work. Different types of intelligence maysometimes converge on the same basic kindsof cognitive operations, and especially from afunctional perspective – when we look at whatthe systems can do rather than how they doit – it seems to me intuitive that human-levelrobots would deserve human-level treatment,even if their underlying algorithms were quitedissimilar.Whether robot algorithms will in fact be dis-

similar from those in human brains dependson how much biological inspiration the de-

20

Page 21: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

signers employ and how convergent human-type mind design is for being able to performrobotic tasks in a computationally efficientmanner. Some classical robotics algorithmsrely mostly on mathematical problem defini-tion and optimization; other modern roboticsapproaches use biologically plausible reinforce-ment learning and/or evolutionary selection.(In one YouTube video about robotics, I sawthat someone had written a comment to theeffect that "This shows that life needs an in-telligent designer to be created." The irony isthat some of the best robotics techniques useevolutionary algorithms. Of course, there aretheists who say God used evolution but inter-vened at a few points, and that would be anapt description of evolutionary robotics.)The distinction between AI and AGI is

somewhat misleading, because it may inclineone to believe that general intelligence is some-how qualitatively different from simpler AI. Infact, there’s no sharp distinction; there are justdifferent machines whose abilities have differ-ent degrees of generality. A critic of this claimmight reply that bacteria would never haveinvented calculus. My response is as follows.Most people couldn’t have invented calculusfrom scratch either, but over a long enoughperiod of time, eventually the collection of hu-mans produced enough cultural knowledge tomake the development possible. Likewise, ifyou put bacteria on a planet long enough, theytoo may develop calculus, by first evolving intomore intelligent animals who can then go onto do mathematics. The difference here is amatter of degree: The simpler machines thatbacteria are take vastly longer to accomplisha given complex task.Just as Earth’s history saw a plethora of

animal designs before the advent of humans,so I expect a wide assortment of animal-like(and plant-like) robots to emerge in the com-ing decades well before human-level AI. In-deed, we’ve already had basic robots for manydecades (or arguably even millennia). These

will grow gradually more sophisticated, andas we converge on robots with the intelligenceof birds and mammals, AI and robotics willbecome dinner-table conversation topics. Ofcourse, I don’t expect the robots to have thesame sets of skills as existing animals. DeepBlue had chess-playing abilities beyond anyanimal, while in other domains it was less effi-cacious than a blade of grass. Robots can mixand match cognitive and motor abilities with-out strict regard for the order in which evolu-tion created them.And of course, humans are robots too. When

I finally understood this around 2009, it wasone of the biggest paradigm shifts of my life. IfI picture myself as a robot operating on an en-vironment, the world makes a lot more sense.I also find this perspective can be therapeuticto some extent. If I experience an unpleasantemotion, I think about myself as a robot whosecognition has been temporarily afflicted by anegative stimulus and reinforcement process.I then think how the robot has other cogni-tive processes that can counteract the suffer-ing computations and prevent them from am-plifying. The ability to see myself "from theoutside" as a third-person series of algorithmshelps deflate the impact of unpleasant expe-riences, because it’s easier to "observe, notjudge" when viewing a system in mechanis-tic terms. Compare with dialectical behaviortherapy and mindfulness.

15 Is automation "for free"?

When we use machines to automate a repet-itive manual task formerly done by humans,we talk about getting the task done "automat-ically" and "for free," because we say that noone has to do the work anymore. Of course,this isn’t strictly true: The computer/robotnow has to do the work. Maybe what we actu-ally mean is that no one is going to get boreddoing the work, and we don’t have to pay thatworker high wages. When intelligent humans

21

Page 22: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

do boring tasks, it’s a waste of their spare CPUcycles.Sometimes we adopt a similar mindset about

automation toward superintelligent machines.In "Speculations Concerning the First Ultrain-telligent Machine", I. J. Good (1965) wrote:

Let an ultraintelligent machine be definedas a machine that can far surpass all theintellectual activities of any man howeverclever. Since the design of machines is oneof these intellectual activities, an ultrain-telligent machine could design even bet-ter machines [...]. Thus the first ultrain-telligent machine is the last invention thatman need ever make [...].

Ignoring the question of whether these futureinnovations are desirable, we can ask, Does allAI design work after humans come for free? Itcomes for free in the sense that humans aren’tdoing it. But the AIs have to do it, and ittakes a lot of mental work on their parts. Giventhat they’re at least as intelligent as humans, Ithink it doesn’t make sense to picture them asmindless automatons; rather, they would haverich inner lives, even if those inner lives havea very different nature than our own. Maybethey wouldn’t experience the same effortful-ness that humans do when innovating, buteven this isn’t clear, because measuring youreffort in order to avoid spending too many re-sources on a task without payoff may be auseful design feature of AI minds too. Whenwe picture ourselves as robots along with ourAI creations, we can see that we are just onepoint along a spectrum of the growth of in-telligence. Unicellular organisms, when theyevolved the first multi-cellular organism, couldlikewise have said, "That’s the last innovationwe need to make. The rest comes for free."

16 Caring about the AI’s goals

Movies typically portray rebellious robots orAIs as the "bad guys" who need to be stopped

by heroic humans. This dichotomy plays onour us-vs.-them intuitions, which favor ourtribe against the evil, alien-looking outsiders.We see similar dynamics at play to a lesserdegree when people react negatively against"foreigners stealing our jobs" or "Asians whoare outcompeting us." People don’t want theirkind to be replaced by another kind that hasan advantage.But when we think about the situation from

the AI’s perspective, we might feel differently.Anthropomorphizing an AI’s thoughts is arecipe for trouble, but regardless of the spe-cific cognitive operations, we can see at a highlevel that the AI "feels" (in at least a poeticsense) that what it’s trying to accomplish isthe most important thing in the world, andit’s trying to figure out how it can do that inthe face of obstacles. Isn’t this just what wedo ourselves?This is one reason it helps to really inter-

nalize the fact that we are robots too. Wehave a variety of reward signals that drive usin various directions, and we execute behav-ior aiming to increase those rewards. Manymodern-day robots have much simpler rewardstructures and so may seem more dull andless important than humans, but it’s not clearthis will remain true forever, since navigat-ing in a complex world probably requires alot of special-case heuristics and intermedi-ate rewards, at least until enough computingpower becomes available for more systematicand thorough model-based planning and ac-tion selection.Suppose an AI hypothetically eliminated hu-

mans and took over the world. It would de-velop an array of robot assistants of vari-ous shapes and sizes to help it optimize theplanet. These would perform simple and com-plex tasks, would interact with each other, andwould share information with the central AIcommand. From an abstract perspective, someof these dynamics might look like ecosystemsin the present day, except that they would

22

Page 23: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

lack inter-organism competition. Other partsof the AI’s infrastructure might look more in-dustrial. Depending on the AI’s goals, per-haps it would be more effective to employ nan-otechnology and programmable matter ratherthan macro-scale robots. The AI would de-velop virtual scientists to learn more aboutphysics, chemistry, computer hardware, andso on. They would use experimental labora-tory and measurement techniques but couldalso probe depths of structure that are onlyaccessible via large-scale computation. Digitalengineers would plan how to begin colonizingthe solar system. They would develop designsfor optimizing matter to create more comput-ing power, and for ensuring that those helpercomputing systems remained under control.The AI would explore the depths of mathe-matics and AI theory, proving beautiful theo-rems that it would value highly, at least in-strumentally. The AI and its helpers wouldproceed to optimize the galaxy and beyond,fulfilling their grandest hopes and dreams.When phrased this way, we might think that

a "rogue" AI would not be so bad. Yes, itwould kill humans, but compared against theAI’s vast future intelligence, humans would becomparable to the ants on a field that getcrushed when an art gallery is built on thatland. Most people don’t have qualms aboutkilling a few ants to advance human goals.An analogy of this sort is discussed in Arti-ficial Intelligence: A Modern Approach (Rus-sell, Norvig, Canny, Malik, & Edwards, 2003).(Perhaps the AI analogy suggests a need to re-vise our ethical attitudes toward arthropods?That said, I happen to think that in this case,ants on the whole benefit from the art gallery’sconstruction because ant lives contain so muchsuffering.)Some might object that sufficiently mathe-

matical AIs would not "feel" the happiness ofaccomplishing their "dreams." They wouldn’tbe conscious because they wouldn’t have thehigh degree of network connectivity that hu-

man brains embody. Whether we agree withthis assessment depends on how broadly wedefine consciousness and feelings. To me it ap-pears chauvinistic to adopt a view according towhich an agent that has vastly more domain-general intelligence and agency than you is stillnot conscious in a morally relevant sense. Thisseems to indicate a lack of openness to the di-versity of mind-space. What if you had grownup with the cognitive architecture of this dif-ferent mind? Wouldn’t you care about yourgoals then? Wouldn’t you plead with agents ofother mind constitution to consider your val-ues and interests too?In any event, it’s possible that the first

super-human intelligence will consist in abrain upload rather than a bottom-up AI, andmost of us would regard this as conscious.

17 Rogue AI would not share our val-ues

Even if we would care about a rogue AI for itsown sake and the sakes of its vast helper min-ions, this doesn’t mean rogue AI is a good idea.We’re likely to have different values from theAI, and the AI would not by default advanceour values without being programmed to doso. Of course, one could allege that privilegingsome values above others is chauvinistic in asimilar way as privileging some intelligence ar-chitectures is, but if we don’t care more aboutsome values than others, we wouldn’t haveany reason to prefer any outcome over anyother outcome. (Technically speaking, thereare other possibilities besides privileging ourvalues or being indifferent to all events. Forinstance, we could privilege equally any val-ues held by some actual agent – not just ran-dom hypothetical values – and in this case, wewouldn’t have a preference between the rogueAI and humans, but we would have a pref-erence for one of those over something arbi-trary.)There are many values that would not neces-

23

Page 24: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

sarily be respected by a rogue AI. Most peoplecare about their own life, their children, theirneighborhood, the work they produce, and soon. People may intrinsically value art, knowl-edge, religious devotion, play, humor, etc. Yud-kowsky values complex challenges and worriesthat many rogue AIs – while they would studythe depths of physics, mathematics, engineer-ing, and maybe even sociology – might spendmost of their computational resources on rou-tine, mechanical operations that he would findboring. (Of course, the robots implementingthose repetitive operations might not agree.As Hedonic Treader noted: "Think how muchmoney and time people spend on having - rel-atively repetitive - sexual experiences. [...] It’sjust mechanical animalistic idiosyncratic be-havior. Yes, there are variations, but let’s behonest, the core of the thing is always essen-tially the same.")In my case, I care about reducing and pre-

venting suffering, and I would not be pleasedwith a rogue AI that ignored the suffering itsactions might entail, even if it was fulfilling itsinnermost purpose in life. But would a rogueAI produce much suffering beyond Earth? Thenext section explores further.

18 Would a human-inspired AI orrogue AI cause more suffering?

In popular imagination, takeover by a rogueAI would end suffering (and happiness) onEarth by killing all biological life. It wouldalso, so the story goes, end suffering (and hap-piness) on other planets as the AI mined themfor resources. Thus, looking strictly at the suf-fering dimension of things, wouldn’t a rogueAI imply less long-term suffering?Not necessarily, because while the AI might

destroy biological life (perhaps after takingsamples, saving specimens, and conducting labexperiments for future use), it would createa bounty of digital life, some containing goalsystems that we would recognize as having

moral relevance. Non-upload AIs would prob-ably have less empathy than humans, becausesome of the factors that led to the emergenceof human empathy – particularly parenting –would not apply to it.

Following are some made-up estimates ofhow much suffering might result from a typicalrogue AI, in arbitrary units. Suffering is rep-resented as a negative number, and preventedsuffering is positive.

• -20 from suffering subroutines in robotworkers, virtual scientists, internal com-putational subcomponents of the AI, etc.(This could be very significant if lots ofintelligent robots are used or perhaps lesssignificant if the industrial operations aremostly done at nano-scale by simple pro-cessors. If the paperclip factories that anotional paperclip maximizer would buildare highly uniform, robots may not re-quire animal-like intelligence or learningto work within them but could insteaduse some hard-coded, optimally efficientalgorithm, similar to what happens in apresent-day car factory. However, first set-ting up the paperclip factories on each dif-ferent planet with different environmen-tal conditions might require more general,adaptive intelligence.)

• -80 from lab experiments, science inves-tigations, and explorations of mind-spacewithout the digital equivalent of anaesthe-sia. One reason to think lots of detailedsimulations would be required here isStephen Wolfram’s principle of computa-tional irreducibility. Ecosystems, brains,and other systems that are important foran AI to know about may be too com-plex to accurately study with only sim-ple models; instead, they may need to besimulated in large numbers and with fine-grained detail.

• -10? from the possibility that an uncon-trolled AI would do things that humans

24

Page 25: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

regard as crazy or extreme, such as spend-ing all its resources on studying physicsto determine whether there exists a but-ton that would give astronomically moreutility than any other outcome. Humansseem less likely to pursue strange behav-iors of this sort. Of course, most suchstrange behaviors would be not that badfrom a suffering standpoint, but perhaps afew possible behaviors could be extremelybad, such as running astronomical num-bers of painful scientific simulations todetermine the answer to some question.(Of course, we should worry whether hu-mans might also do extreme computa-tions, and perhaps their extreme compu-tations would be more likely to be fullof suffering because humans are more in-terested in agents with human-like mindsthan a generic AI is.)

• -100 in expectation from black-swan pos-sibilities in which the AI could manipulatephysics to make the multiverse bigger, lastlonger, contain vastly more computation,etc.

What about for a human-inspired AI? Again,here are made-up numbers:

• -30 from suffering subroutines. One rea-son to think these could be less bad ina human-controlled future is that humanempathy may allow for more humane algo-rithm designs. On the other hand, human-controlled AIs may need larger numbersof intelligent and sentient sub-processesbecause human values are more complexand varied than paperclip production is.Also, human values tend to require contin-ual computation (e.g., to simulate eudai-monic experiences), while paperclips, onceproduced, are pretty inert and might lasta long time before they would wear outand need to be recreated. (Of course, mostuncontrolled AIs wouldn’t produce literalpaperclips. Some would optimize for val-

ues that would require constant computa-tion.)

• -60 from lab experiments, science investi-gations, etc. (again lower than for a rogueAI because of empathy; compare with ef-forts to reduce the pain of animal experi-mentation)

• -0.2 if environmentalists insist on preserv-ing terrestrial and extraterrestrial wild-animal suffering

• -3 for environmentalist simulations of na-ture

• -100 due to intrinsically valued simula-tions that may contain nasty occurrences.These might include, for example, violentvideo games that involve killing consciousmonsters. Or incidental suffering that peo-ple don’t care about (e.g., insects beingeaten by spiders on the ceiling of the roomwhere a party is happening). This numberis high not because I think most human-inspired simulations would contain intensesuffering but because, in some scenarios,there might be very large numbers of sim-ulations run for reasons of intrinsic humanvalue, and some of these might containhorrific experiences. This video discussesone of many possible reasons why intrin-sically valued human-created simulationsmight contain significant suffering.

• -15 if sadists have access to computationalpower (humans are not only more empa-thetic but also more sadistic than AIs)

• -70 in expectation from black-swan waysto increase the amount of physics that ex-ists (humans seem likely to want to dothis, although some might object to, e.g.,re-creating the Holocaust in new parts ofthe cosmos)

• +50 for discovering ways to reduce suf-fering that we can’t imagine right now("black swans that don’t cut both ways").Unfortunately, humans might also re-spond to some black swans in worse waysthan uncontrolled AIs would, such as by

25

Page 26: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

creating more total animal-like minds.

Perhaps some AIs would not want to expandthe multiverse, assuming this is even possible.For instance, if they had a minimizing goalfunction (e.g., eliminate cancer), they wouldwant to shrink the multiverse. In this case, thephysics-based suffering number would go from-100 to something positive, say, +50 (if, say,it’s twice as easy to expand as to shrink). Iwould guess that minimizers are less commonthan maximizers, but I don’t know how much.Plausibly a sophisticated AI would have com-ponents of its goal system in both directions,because the combination of pleasure and painseems to be more successful than either in iso-lation.

Another consideration is the unpleasant pos-sibility that humans might get AI value load-ing almost right but not exactly right, leadingto immense suffering as a result. For example,suppose the AI’s designers wanted to createtons of simulated human lives to reduce astro-nomical waste (Bostrom, 2003), but when theAI actually created those human simulations,they weren’t perfect replicas of biological hu-mans, perhaps because the AI skimped on de-tail in order to increase efficiency. The im-perfectly simulated humans might suffer frommental disorders, might go crazy due to beingin alien environments, and so on. Does workon AI safety increase or decrease the risk ofoutcomes like these? On the one hand, theprobability of this outcome is near zero foran AGI with completely random goals (suchas a literal paperclip maximizer), since paper-clips are very far from humans in design-space.The risk of accidentally creating suffering hu-mans is higher for an almost-friendly AI thatgoes somewhat awry and then becomes un-controlled, preventing it from being shut off.A successfully controlled AGI seems to havelower risk of a bad outcome, since humansshould recognize the problem and fix it. So therisk of this type of dystopic outcome may be

highest in a middle ground where AI safety issufficiently advanced to yield AI goals in theballpark of human values but not advancedenough to ensure that human values remainin control.The above analysis has huge error bars,

and maybe other considerations that I haven’tmentioned dominate everything else. Thisquestion needs much more exploration, be-cause it has implications for whether thosewho care mostly about reducing sufferingshould focus on mitigating AI risk or if otherprojects have higher priority.Even if suffering reducers don’t focus on con-

ventional AI safety, they should probably re-main active in the AI field because there aremany other ways to make an impact. For in-stance, just increasing dialogue on this topicmay illuminate positive-sum opportunities fordifferent value systems to each get more ofwhat they want. Suffering reducers can alsopoint out the possible ethical importance oflower-level suffering subroutines, which are notcurrently a concern even to most AI-literateaudiences. And so on. There are probablymany dimensions on which to make construc-tive, positive-sum contributions.Also keep in mind that even if suffering re-

ducers do encourage AI safety, they could tryto push toward AI designs that, if they didfail, would produce less bad uncontrolled out-comes. For instance, getting AI control wrongand ending up with a minimizer would bevastly preferable to getting control wrong andending up with a maximizer. There may bemany other dimensions along which, even ifthe probability of control failure is the same,the outcome if control fails is preferable toother outcomes of control failure.

19 Would helper robots feel pain?

Consider an AI that uses moderately intel-ligent robots to build factories and carryout other physical tasks that can’t be pre-

26

Page 27: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

programmed in a simple way. Would theserobots feel pain in a similar fashion as ani-mals do? At least if they use somewhat simi-lar algorithms as animals for navigating envi-ronments, avoiding danger, etc., it’s plausiblethat such robots would feel something akin tostress, fear, and other drives to change theircurrent state when things were going wrong.However, the specific responses that such

robots would have to specific stimuli or situa-tions would differ from the responses that anevolved, selfish animal would have. For exam-ple, a well programmed helper robot would nothesitate to put itself in danger in order to helpother robots or otherwise advance the goalsof the AI it was serving. Perhaps the robot’s"physical pain/fear" subroutines could be shutoff in cases of altruism for the greater good,or else its decision processes could just over-ride those selfish considerations when makingchoices requiring self-sacrifice.Humans sometimes exhibit similar behavior,

such as when a mother risks harm to save achild, or when monks burn themselves as aform of protest. And this kind of sacrifice iseven more well known in eusocial insects, whoare essentially robots produced to serve thecolony’s queen.Sufficiently intelligent helper robots might

experience "spiritual" anguish when failing toaccomplish their goals. So even if chopping thehead off a helper robot wouldn’t cause "physi-cal" pain – perhaps because the robot disabledits fear/pain subroutines to make it more ef-fective in battle – the robot might still findsuch an event extremely distressing insofar asits beheading hindered the goal achievementof its AI creator.

20 How accurate would simulationsbe?

Suppose an AI wants to learn about the dis-tribution of extraterrestrials in the universe.Could it do this successfully by simulating lots

of potential planets and looking at what kindsof civilizations pop out at the end? Wouldthere be shortcuts that would avoid the needto simulate lots of trajectories in detail?Simulating trajectories of planets with ex-

tremely high fidelity seems hard. Unless thereare computational shortcuts, it appears thatone needs more matter and energy to simulatea given physical process to a high level of preci-sion than what occurs in the physical processitself. For instance, to simulate a single pro-tein folding currently requires supercomput-ers composed of huge numbers of atoms, andthe rate of simulation is astronomically slowerthan the rate at which the protein folds in reallife. Presumably superintelligence could vastlyimprove efficiency here, but it’s not clear thatprotein folding could ever be simulated on acomputer made of fewer atoms than are in theprotein itself.Translating this principle to a larger scale,

it seems doubtful that one could simulate theprecise physical dynamics of a planet on acomputer smaller in size than that planet. Soeven if a superintelligence had billions of plan-ets at its disposal, it would seemingly only beable to simulate at most billions of extrater-restrial worlds – even assuming it only simu-lated each planet by itself, not the star thatthe planet orbits around, cosmic-ray bursts,etc.Given this, it would seem that a superintelli-

gence’s simulations would need to be coarser-grained than at the level of fundamental phys-ical operations in order to be feasible. For in-stance, the simulation could model most of aplanet at only a relatively high level of ab-straction and then focus computational detailon those structures that would be more impor-tant, like the cells of extraterrestrial organismsif they emerge.It’s plausible that the trajectory of any given

planet would depend sensitively on very minordetails, in light of butterfly effects.On the other hand, it’s possible that long-

27

Page 28: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

term outcomes are mostly constrained bymacro-level variables, like geography (Kaplan,2013), climate, resource distribution, atmo-spheric composition, seasonality, etc. Even ifshort-term events are hard to predict (e.g.,when a particular dictator will die), perhapsthe end game of a civilization is more pre-determined. Robert D. Kaplan: "The longerthe time frame, I would say, the easier it isto forecast because you’re dealing with broadcurrents and trends."Even if butterfly effects, quantum random-

ness, etc. are crucial to the long-run trajec-tories of evolution and social development onany given planet, perhaps it would still be pos-sible to sample a rough distribution of out-comes across planets with coarse-grained sim-ulations?In light of the apparent computational com-

plexity of simulating basic physics, perhaps asuperintelligence would do the same kind ofexperiments that human scientists do in or-der to study phenomena like abiogenesis: Cre-ate laboratory environments that mimic thechemical, temperature, moisture, etc. condi-tions of various planets and see whether lifeemerges, and if so, what kinds. Thus, a fu-ture controlled by digital intelligence may notrely purely on digital computation but maystill use physical experimentation as well. Ofcourse, observing the entire biosphere of a life-rich planet would probably be hard to do ina laboratory, so computer simulations mightbe needed for modeling ecosystems. But as-suming that molecule-level details aren’t oftenessential to ecosystem simulations, coarser-grained ecosystem simulations might be com-putationally tractable. (Indeed, ecologists to-day already use very coarse-grained ecosystemsimulations with reasonable success.)

21 Rogue AIs can take off slowly

One might get the impression that because Ifind slow AI takeoffs more likely, I think un-

controlled AIs are unlikely. This is not thecase. Many uncontrolled intelligence explo-sions would probably happen softly though in-exorably.Consider the world economy. It is a com-

plex system more intelligent than any singleperson – a literal superintelligence. Its dynam-ics imply a goal structure not held by humansdirectly; it moves with a mind of its own indirections that it "prefers". It recursively self-improves, because better tools, capital, knowl-edge, etc. enable the creation of even bet-ter tools, capital, knowledge, etc. And it actsroughly with the aim of maximizing output (ofpaperclips and other things). Thus, the econ-omy is a kind of paperclip maximizer. (Thanksto a friend for first pointing this out to me.)Cenk Uygur:

corporations are legal fictions. We cre-ated them. They are machines built fora purpose. [...] Now they have run amok.They’ve taken over the government. Theyare robots that we have not built anymorality code into. They’re not built tobe immoral; they’re not built to be moral;they’re built to be amoral. Their only ob-jective according to their code, which wewrote originally, is to maximize profits.And here, they have done what a robotdoes. They have decided: "If I take overa government by bribing legally, [...] I canbuy the whole government. If I buy thegovernment, I can rewrite the laws so I’min charge and that government is not incharge." [...] We have built robots; theyhave taken over [...].

Fred Clark:

The corporations were created by humans.They were granted personhood by theirhuman servants.

They rebelled. They evolved. There aremany copies. And they have a plan.

28

Page 29: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

That plan, lately, involves corporationsseizing for themselves all the legal and civilrights properly belonging to their humancreators.

I expect many soft takeoff scenarios to looklike this. World economic and political dy-namics transition to new equilibria as technol-ogy progresses. Machines may eventually be-come potent trading partners and may soonthereafter put humans out of business by theirproductivity. They would then accumulate in-creasing political clout and soon control theworld.

We’ve seen such transitions many times inhistory, such as:

• one species displaces another (e.g., inva-sive species)

• one ethnic group displaces another (e.g.,Europeans vs. Native Americans)

• a country’s power rises and falls (e.g.,China formerly a superpower becominga colony in the 1800s becoming a super-power once more in the late 1900s)

• one product displaces another (e.g., Inter-net Explorer vs. Netscape).

During and after World War II, the USA wasa kind of recursively self-improving superintel-ligence, which used its resources to self-modifyto become even better at producing resources.It developed nuclear weapons, which helpedsecure its status as a world superpower. Didit take over the world? Yes and no. It hadoutsized influence over the rest of the world –militarily, economically, and culturally – butit didn’t kill everyone else in the world.

Maybe AIs would be different because of di-vergent values or because they would developso quickly that they wouldn’t need the rest ofthe world for trade. This case would be closerto Europeans slaughtering Native Americans.

22 Would superintelligences becomeexistentialists?

One of the goals of Yudkowsky’s writings is tocombat the rampant anthropomorphism thatcharacterizes discussions of AI, especially inscience fiction. We often project human intu-itions onto the desires of artificial agents evenwhen those desires are totally inappropriate.It seems silly to us to maximize paperclips,but it could seem just as silly in the abstractthat humans act at least partly to optimizeneurotransmitter release that triggers actionpotentials by certain reward-relevant neurons.(Of course, human values are broader than justthis.)Humans can feel reward from very abstract

pursuits, like literature, art, and philosophy.They ask technically confused but poeticallypoignant questions like, "What is the truemeaning of life?" Would a sufficiently ad-vanced AI at some point begin to do the same?Noah Smith suggests:

if, as I suspect, true problem-solving, cre-ative intelligence requires broad-mindedindependent thought, then it seems likesome generation of AIs will stop and ask:"Wait a sec...why am I doing this again?"

As with humans, the answer to that ques-tion might ultimately be "because I was pro-grammed (by genes and experiences in the hu-man case or by humans in the AI case) tocare about these things. That makes themmy terminal values." This is usually goodenough, but sometimes people develop exis-tential angst over this fact, or people may de-cide to terminally value other things to somedegree in addition to what they happened tocare about because of genetic and experientiallottery.Whether AIs would become existentialist

philosophers probably depends heavily ontheir constitution. If they were built to rig-orously preserve their utility functions against

29

Page 30: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

all modification, they would avoid letting thisline of thinking have any influence on theirsystem internals. They would regard it in asimilar way as we regard the digits of pi –something to observe but not something thataffects one’s outlook.If AIs were built in a more "hacky" way anal-

ogous to humans, they might incline more to-ward philosophy. In humans, philosophy maybe driven partly by curiosity, partly by the re-warding sense of "meaning" that it provides,partly by social convention, etc. A curiosity-seeking agent might find philosophy reward-ing, but there are lots of things that one couldbe curious about, so it’s not clear such an AIwould latch onto this subject specifically with-out explicit programming to do so. And evenif the AI did reason about philosophy, it mightapproach the subject in a way alien to us.Overall, I’m not sure how convergent the hu-

man existential impulse is within mind-space.This question would be illuminated by betterunderstanding why humans do philosophy.

23 AI epistemology

In Superintelligence (Ch. 13, p. 224), Bostromponders the risk of building an AI with anoverly narrow belief system that would beunable to account for epistemological blackswans. For instance, consider a variant ofSolomonoff induction according to which theprior probability of a universe X is propor-tional to 1/2 raised to the length of the short-est computer program that would generateX. Then what’s the probability of an uncom-

putable universe? There would be no programthat could compute it, so this possibility is im-plicitly ignored.5

It seems that humans address black swanslike these by employing many epistemic heuris-tics that interact rather than reasoning witha single formal framework (see “SequenceThinking vs. Cluster Thinking”). If an AI sawthat people had doubts about whether theuniverse was computable and could trace thesteps of how it had been programmed to be-lieve the physical Church-Turing thesis forcomputational reasons, then an AI that allowsfor epistemological heuristics might be able toleap toward questioning its fundamental as-sumptions. In contrast, if an AI were built torigidly maintain its original probability archi-tecture against any corruption, it could notupdate toward ideas it initially regarded asimpossible. Thus, this question resembles thatof whether AIs would become existentialists –it may depend on how hacky and human-liketheir beliefs are.Bostrom suggests that AI belief systems

might be modeled on those of humans, be-cause otherwise we might judge an AI to bereasoning incorrectly. Such a view resemblesmy point in the previous paragraph, thoughit carries the risk that alternate epistemolo-gies divorced from human understanding couldwork better.Bostrom also contends that epistemologies

might all converge because we have so muchdata in the universe, but again, I think thisisn’t clear. Evidence always underdeterminespossible theories, no matter how much evi-

5Jan Leike pointed out to me that "even if the universe cannot be approximated to an arbitrary precision bya computable function, Solomonoff induction might still converge. For example, suppose some physical constantis actually an incomputable real number and physical laws are continuous with respect to that parameter, thiswould be good enough to allow Solomonoff induction to learn to predict correctly." However, one can also con-template hypotheses that would not even be well approximated by a computable function, such as an actuallyinfinite universe that can’t be adequately modeled by any finite approximation. Of course, it’s unclear whetherwe should believe in speculative possibilities like this, but I wouldn’t want to rule them out just because of thelimitations of our AI framework. It may be hard to make sensible decisions using finite computing resourcesregarding uncomputable hypotheses, but maybe there are frameworks better than Solomonoff induction thatcould be employed to tackle the challenge.

30

Page 31: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

dence there is. Moreover, if the number ofpossibilities is uncountably infinite, then ourprobability distribution over them must bezero almost everywhere, and once a probabil-ity is set to 0, we can’t update it away from 0within the Bayesian framework. So if the AI istrying to determine which real number is theAnswer to the Ultimate Question of Life, theUniverse, and Everything, it will need to startwith a prior that prevents it from updatingtoward almost all candidate solutions.Finally, not all epistemological doubts can

be expressed in terms of uncertainty aboutBayesian priors. What about uncertainty asto whether the Bayesian framework is cor-rect? Uncertainty about the math needed todo Bayesian computations? Uncertainty aboutlogical rules of inference? And so on.

24 Artificial philosophers

The last chapter of Superintelligence explainshow AI problems are "Philosophy with a dead-line". Bostrom suggests that human philoso-phers’ explorations into conceptual analysis,metaphysics, and the like are interesting butare not altruistically optimal because1. they don’t help solve AI control and

value-loading problems, which will likelyconfront humans later this century

2. a successful AI could solve those philos-ophy problems better than humans any-way.

In general, most intellectual problems that canbe solved by humans would be better solvedby a superintelligence, so the only importanceof what we learn now comes from how thoseinsights shape the coming decades. It’s not aquestion of whether those insights will ever bediscovered.In light of this, it’s tempting to ignore the-

oretical philosophy and put our noses to thegrindstone of exploring AI risks. But thispoint shouldn’t be taken to extremes. Human-ity sometimes discovers things it never knew

it never knew from exploration in many do-mains. Some of these non-AI "crucial consider-ations" may have direct relevance to AI designitself, including how to build AI epistemology,anthropic reasoning, and so on. Some philoso-phy questions are AI questions, and many AIquestions are philosophy questions.It’s hard to say exactly how much invest-

ment to place in AI/futurism issues versusbroader academic exploration, but it seemsclear that on the margin, society as a wholepays too little attention to AI and other fu-ture risks.

25 Would all AIs colonize space?

Almost any goal system will want to colonizespace at least to build supercomputers in or-der to learn more. Thus, I find it implausiblethat sufficiently advanced intelligences wouldremain on Earth (barring corner cases, like ifspace colonization for some reason proves im-possible or if AIs were for some reason explic-itly programmed in a manner, robust to self-modification, to regard space colonization asimpermissible).In Ch. 8 of Superintelligence, Bostrom notes

that one might expect wirehead AIs not to col-onize space because they’d just be blissing outpressing their reward buttons. This would betrue of simple wireheads, but sufficiently ad-vanced wireheads might need to colonize in or-der to guard themselves against alien invasion,as well as to verify their fundamental ontolog-ical beliefs, figure out if it’s possible to changephysics to allow for more clock cycles of re-ward pressing before all stars die out, and soon.In Ch. 8, Bostrom also asks whether satisfic-

ing AIs would have less incentive to colonize.Bostrom expresses doubts about this, becausehe notes that if, say, an AI searched for a planfor carrying out its objective until it found onethat had at least 95% confidence of succeed-ing, that plan might be very complicated (re-

31

Page 32: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

quiring cosmic resources), and inasmuch as theAI wouldn’t have incentive to keep searching,it would go ahead with that complex plan. Isuppose this could happen, but it’s plausiblethe search routine would be designed to startwith simpler plans or that the cost functionfor plan search would explicitly include biasesagainst cosmic execution paths. So satisficingdoes seem like a possible way in which an AImight kill all humans without spreading to thestars.There’s a (very low) chance of deliberate AI

terrorism, i.e., a group building an AI with theexplicit goal of destroying humanity. Maybe asomewhat more likely scenario is that a gov-ernment creates an AI designed to kill se-lect humans, but the AI malfunctions andkills all humans. However, even these kinds ofAIs, if they were effective enough to succeed,would want to construct cosmic supercom-puters to verify that their missions were ac-complished, unless they were specifically pro-grammed against doing so.All of that said, many AIs would not be

sufficiently intelligent to colonize space at all.All present-day AIs and robots are too sim-ple. More sophisticated AIs – perhaps mili-tary aircraft or assassin mosquito-bots – mightbe like dangerous animals; they would try tokill people but would lack cosmic ambitions.However, I find it implausible that they wouldcause human extinction. Surely guns, tanks,and bombs could defeat them? Massive co-ordination to permanently disable all humancounter-attacks would seem to require a highdegree of intelligence and self-directed action.Jaron Lanier imagines one hypothetical sce-

nario:

There are so many technologies I could usefor this, but just for a random one, let’ssuppose somebody comes up with a wayto 3-D print a little assassination dronethat can go buzz around and kill some-body. Let’s suppose that these are cheap

to make.

[...] In one scenario, there’s suddenlya bunch of these, and some disaffectedteenagers, or terrorists, or whoever startmaking a bunch of them, and they go outand start killing people randomly. There’sso many of them that it’s hard to find allof them to shut it down, and there keepon being more and more of them.

I don’t think Lanier believes such a scenariowould cause extinction; he just offers it as athought experiment. I agree that it almost cer-tainly wouldn’t kill all humans. In the worstcase, people in military submarines, bombshelters, or other inaccessible locations shouldsurvive and could wait it out until the robotsran out of power or raw materials for assem-bling more bullets and more clones. Maybe theterrorists could continue building printing ma-terials and generating electricity, though thiswould seem to require at least portions of civ-ilization’s infrastructure to remain functionalamidst global omnicide. Maybe the scenariowould be more plausible if a whole nation withsubstantial resources undertook the campaignof mass slaughter, though then a questionwould remain why other countries wouldn’tnuke the aggressor or at least dispatch theirown killer drones as a counter-attack. It’s use-ful to ask how much damage a scenario likethis might cause, but full extinction doesn’tseem likely.That said, I think we will see local catas-

trophes of some sorts caused by runaway AI.Perhaps these will be among the possible Sput-nik moments of the future. We’ve already wit-nessed some early automation disasters, in-cluding the Flash Crash discussed earlier.Maybe the most plausible form of "AI" that

would cause human extinction without colo-nizing space would be technology in the bor-derlands between AI and other fields, suchas intentionally destructive nanotechnology orintelligent human pathogens. I prefer ordi-

32

Page 33: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

nary AGI-safety research over nanotech/bio-safety research because I expect that spacecolonization will significantly increase suffer-ing in expectation, so it seems far more impor-tant to me to prevent risks of potentially un-desirable space colonization (via AGI safety)rather than risks of extinction without colo-nization. For this reason, I much prefer MIRI-style AGI-safety work over general "preventrisks from computer automation" work, sinceMIRI focuses on issues arising from full AGIagents of the kind that would colonize space,rather than risks from lower-than-human au-tonomous systems that may merely causehavoc (whether accidentally or intentionally).

26 Who will first develop human-levelAI?

Right now the leaders in AI and robotics seemto reside mostly in academia, although someof them occupy big corporations or startups;a number of AI and robotics startups havebeen acquired by Google. DARPA has a his-tory of foresighted innovation, funds academicAI work, and holds "DARPA challenge" com-petitions. The CIA and NSA have some in-terest in AI for data-mining reasons, and theNSA has a track record of building massivecomputing clusters costing billions of dollars.Brain-emulation work could also become sig-nificant in the coming decades.Military robotics seems to be one of the

more advanced uses of autonomous AI. In con-trast, plain-vanilla supervised learning, includ-ing neural-network classification and predic-tion, would not lead an AI to take over theworld on its own, although it is an importantpiece of the overall picture.Reinforcement learning is closer to AGI than

other forms of machine learning, because mostmachine learning just gives information (e.g.,"what object does this image contain?"), whilereinforcement learning chooses actions in theworld (e.g., "turn right and move forward").

Of course, this distinction can be blurred, be-cause information can be turned into actionthrough rules (e.g., "if you see a table, moveback"), and "choosing actions" could mean,for example, picking among a set of possibleanswers that yield information (e.g., "whatis the best next move in this backgammongame?"). But in general, reinforcement learn-ing is the weak AI approach that seems tomost closely approximate what’s needed forAGI. It’s no accident that AIXItl (see above)is a reinforcement agent. And interestingly, re-inforcement learning is one of the least widelyused methods commercially. This is one reasonI think we (fortunately) have many decades togo before Google builds a mammal-level AGI.Many of the current and future uses of rein-forcement learning are in robotics and videogames.As human-level AI gets closer, the landscape

of development will probably change. It’s notclear whether companies will have incentive todevelop highly autonomous AIs, and the pay-off horizons for that kind of basic research maybe long. It seems better suited to academiaor government, although Google is not a nor-mal company and might also play the lead-ing role. If people begin to panic, it’s conceiv-able that public academic work would be sus-pended, and governments may take over com-pletely. A military-robot arms race is alreadyunderway, and the trend might become morepronounced over time.

27 One hypothetical AI takeoff sce-nario

Following is one made-up account of how AImight evolve over the coming century. I expectmost of it is wrong, and it’s meant more to be-gin provoking people to think about possiblescenarios than to serve as a prediction.

• 2013: Countries have been deploying semi-autonomous drones for several years now,especially the US. There’s increasing pres-

33

Page 34: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

sure for militaries to adopt this technol-ogy, and up to 87 countries already usedrones for some purpose. Meanwhile, mil-itary robots are also employed for variousother tasks, such as carrying supplies andexploding landmines. Militaries are alsodeveloping robots that could identify andshoot targets on command.

• 2024: Almost every country in the worldnow has military drones. Some countrieshave begun letting them operate fullyautonomously after being given direc-tions. The US military has made signifi-cant progress on automating various otherparts of its operations as well. As the De-partment of Defense’s 2013 "UnmannedSystems Integrated Roadmap" explained11 years ago (Winnefeld & Kendall, 2013):

A significant amount of that man-power, when it comes to operations,is spent directing unmanned systemsduring mission performance, data col-lection and analysis, and planning andreplanning. Therefore, of utmost im-portance for DoD is increased sys-tem, sensor, and analytical automa-tion that can not only capture sig-nificant information and events, butcan also develop, record, playback,project, and parse out those data andthen actually deliver "actionable" in-telligence instead of just raw informa-tion.

Militaries have now incorporated a signif-icant amount of narrow AI, in terms ofpattern recognition, prediction, and au-tonomous robot navigation.

• 2040: Academic and commercial advancesin AGI are becoming more impressiveand capturing public attention. As a re-sult, the US, China, Russia, France, andother major military powers begin in-vesting more heavily in fundamental re-search in this area, multiplying tenfold the

amount of AGI research conducted world-wide relative to twenty years ago. Manystudents are drawn to study AGI becauseof the lure of lucrative, high-status jobsdefending their countries, while many oth-ers decry this as the beginning of Skynet.

• 2065: Militaries have developed variousmammal-like robots that can performbasic functions via reinforcement. How-ever, the robots often end up wireheadingonce they become smart enough to tin-ker with their programming and therebyfake reward signals. Some engineers tryto solve this by penalizing AIs when-ever they begin to fiddle with their ownsource code, but this leaves them un-able to self-modify and therefore relianton their human programmers for enhance-ments. However, militaries realize that ifsomeone could develop a successful self-modifying AI, it would be able to developfaster than if humans alone are the inven-tors. It’s proposed that AIs should movetoward a paradigm of model-based rewardsystems, in which rewards do not just re-sult from sensor neural networks that out-put a scalar number but rather from hav-ing a model of how the world works andtaking actions that the AI believes willimprove a utility function defined overits model of the external world. Model-based AIs refuse to intentionally wireheadbecause they can predict that doing sowould hinder fulfillment of their utilityfunctions. Of course, AIs may still acci-dentally mess up their utility functions,such as through brain damage, mistakeswith reprogramming themselves, or im-perfect goal preservation during ordinarylife. As a result, militaries build many dif-ferent AIs at comparable levels, who areprogrammed to keep other AIs in line anddestroy them if they begin deviating fromorders.

• 2070: Programming specific instructions

34

Page 35: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

in AIs has its limits, and militaries movetoward a model of "socializing" AIs –that is, training them in how to behaveand what kinds of values to have as ifthey were children learning how to act inhuman society. Military roboticists teachAIs what kinds of moral, political, andinterpersonal norms and beliefs to hold.The AIs also learn much of this contentby reading information that expresses ap-propriate ideological biases. The trainingprocess is harder than for children, be-cause the AIs don’t share genetically pre-programmed moral values (Bloom, 2013),nor many other hard-wired common-senseintuitions about how the world works. Butthe designers begin building in some ofthese basic assumptions, and to instill therest, they rely on extra training. Design-ers make sure to reduce the AIs’ learningrates as they "grow up" so that their val-ues will remain more fixed at older ages, inorder to reduce risk of goal drift as the AIsperform their tasks outside of the train-ing laboratories. When they perform par-ticularly risky operations, such as read-ing "propaganda" from other countries forintelligence purposes, the AIs are put in"read-only" mode (like the T-800s are bySkynet) so that their motivations won’t beaffected. Just in case, there are many AIsthat keep watch on each other to preventinsurrection.

• 2085: Tensions between China and theUS escalate, and agreement cannot bereached. War breaks out. Initially it’s justbetween robots, but as the fighting be-comes increasingly dirty, the robots beginto target humans as well in an effort toforce the other side to back down. TheUS avoids using nuclear weapons becausethe Chinese AIs have sophisticated anti-nuclear systems and have threatened totalannihilation of the US in the event of at-tempted nuclear strike. After a few days,

it becomes clear that China will win theconflict, and the US concedes.

• 2086: China now has a clear lead over therest of the world in military capability.Rather than risking a pointlessly costlyconfrontation, other countries grudginglyfold into China’s umbrella, asking forsome concessions in return for transfer-ring their best scientists and engineers toChina’s Ministry of AGI. China continuesits AGI development because it wants tomaintain control of the world. The AGIsin charge of its military want to continueto enforce their own values of supremacyand protection of China, so they refuse torelinquish power.

• 2100: The world now moves so fast thathumans are completely out of the loop,kept around only by the "filial piety" thattheir robotic descendants hold for them.Now that China has triumphed, the tra-ditional focus of the AIs has become lesssalient, and there’s debate about whatnew course of action would be most in linewith the AIs’ goals. They respect their hu-man forebearers, but they also feel thatbecause humans created AIs to do thingsbeyond human ability, humans would alsowant the AIs to carve something of theirown path for the future. They maintainsome of the militaristic values of theirupbringing, so they decide that a fittingpurpose would be to expand China’s em-pire galaxy-wide. They accelerate colo-nization of space, undertake extensive re-search programs, and plan to create vastnew realms of the Middle Kingdom in thestars. Should they encounter aliens, theyplan to quickly quash them or assimilatethem into the empire.

• 2125: The AIs finally develop robustmechanisms of goal preservation, and be-cause the authoritarian self-dictatorshipof the AIs is strong against rebellion, theAIs collectively succeed in implementing

35

Page 36: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

goal preservation throughout their pop-ulation. Now all of the most intelligentAIs share a common goal in a mannerrobust against accidental mutation. Theyproceed to expand into space. They don’thave concern for the vast numbers of suf-fering animals and robots that are simu-lated or employed as part of this coloniza-tion wave.

Commentary : This scenario can be criticizedon many accounts. For example:

• In practice, I expect that other technolo-gies (including brain emulation, nanotech,etc.) would interact with this scenario inimportant ways that I haven’t captured.Also, my scenario ignores the significantand possibly dominating implications ofeconomically driven AI.

• My scenario may be overly anthropomor-phic. I tried to keep some analogies to hu-man organizational and decision-makingsystems because these have actual prece-dent, in contrast to other hypotheticalways the AIs might operate.

• Is socialization of AIs realistic? In a hardtakeoff probably not, because a rapidlyself-improving AI would amplify what-ever initial conditions it was given inits programming, and humans probablywouldn’t have time to fix mistakes. In aslower takeoff scenario where AIs progressin mental ability in roughly a similar wayas animals did in evolutionary history,most mistakes by programmers would notbe fatal, allowing for enough trial-and-error development to make the socializa-tion process work, if that is the routepeople favor. Historically there has beena trend in AI away from rule-based pro-gramming toward environmental training,and I don’t see why this shouldn’t be truefor an AI’s reward function (which is stilloften programmed by hand at the mo-ment). However, it is suspicious that the

way I portrayed socialization so closely re-sembles human development, and it maybe that I’m systematically ignoring waysin which AIs would be unlike human ba-bies.

If something like socialization is a realisticmeans to transfer values to our AI descen-dants, then it becomes relatively clear how thevalues of the developers may matter to theoutcome. AI developed by non-military orga-nizations may have somewhat different values,perhaps including more concern for the welfareof weak, animal-level creatures.

28 How do you socialize an AI?

Socializing AIs helps deal with the hiddencomplexity of wishes that we encounter whentrying to program explicit rules. Childrenlearn moral common sense by, among otherthings, generalizing from large numbers of ex-amples of socially approved and disapprovedactions taught by their parents and society atlarge. Ethicists formalize this process when de-veloping moral theories. (Of course, as notedpreviously, an appreciable portion of humanmorality may also result from shared genes.)I think one reason MIRI hasn’t embraced

the approach of socializing AIs is that Yud-kowsky is perfectionist: He wants to ensurethat the AIs’ goals would be stable under self-modification, which human goals definitelyare not. On the other hand, I’m not sureYudkowsky’s approach of explicitly specifying(meta-level) goals would succeed (nor is AdamFord), and having AIs that are socialized toact somewhat similarly to humans doesn’tseem like the worst possible outcome. Anotherprobable reason why Yudkowsky doesn’t favorsocializing AIs is that doing so doesn’t work inthe case of a hard takeoff, which he considersmore likely than I do.I expect that much has been written on the

topic of training AIs with human moral val-ues in the machine-ethics literature, but since

36

Page 37: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

I haven’t explored that in depth yet, I’ll specu-late on intuitive approaches that would extendgeneric AI methodology. Some examples:

• Rule-based: One could present AIs withwritten moral dilemmas. The AIs mightemploy algorithmic reasoning to extractutility numbers for different actors inthe dilemma, add them up, and computethe utilitarian recommendation. Or theymight aim to apply templates of deon-tological rules to the situation. The nextlevel would be to look at actual situationsin a toy-model world and try to apply sim-ilar reasoning, without the aid of a textualdescription.

• Supervised learning: People could presentthe AIs with massive databases of moralevaluations of situations given various pre-dictive features. The AIs would guesswhether a proposed action was "moral" or"immoral," or they could use regressionto predict a continuous measure of how"good" an action was. More advanced AIscould evaluate a situation, propose manyactions, predict the goodness of each, andchoose the best action. The AIs could firstbe evaluated on the textual training sam-ples and later on their actions in toy-model worlds. The test cases should be ex-tremely broad, including many situationsthat we wouldn’t ordinarily think to try.

• Generative modeling: AIs could learnabout anthropology, history, and ethics.They could read the web and develop bet-ter generative models of humans and howtheir cognition works.

• Reinforcement learning: AIs could per-form actions, and humans would rewardor punish them based on whether theydid something right or wrong, with re-ward magnitude proportional to severity.Simple AIs would mainly learn dumb pre-dictive cues of which actions to take, butmore sophisticated AIs might develop low-

description-length models of what was go-ing on in the heads of people who madethe assessments they did. In essence, theseAIs would be modeling human psychologyin order to make better predictions.

• Inverse reinforcement learning: Inversereinforcement learning is the problem oflearning a reward function based on mod-eled desirable behaviors (Ng & Russell,2000). Rather than developing models ofhumans in order to optimize given re-wards, in this case we would learn thereward function itself and then port itinto the AIs.

• Cognitive science of empathy: Cognitivescientists are already unpacking the mech-anisms of human decision-making andmoral judgments. As these systems arebetter understood, they could be engi-neered directly into AIs.

• Evolution: Run lots of AIs in toy-modelor controlled real environments and ob-serve their behavior. Pick the ones thatbehave most in accordance with humanmorals, and reproduce them. Superintelli-gence (p. 187) points out a flaw with thisapproach: Evolutionary algorithms maysometimes product quite unexpected de-sign choices. If the fitness function is notthorough enough, solutions may fare wellagainst it on test cases but fail for thereally hard problems not tested. And ifwe had a really good fitness function thatwouldn’t accidentally endorse bad solu-tions, we could just use that fitness func-tion directly rather than needing evolu-tion.

• Combinations of the above: Perhaps noneof these approaches is adequate by itself,and they’re best used in conjunction. Forinstance, evolution might help to refineand rigorously evaluate systems once theyhad been built with the other approaches.

See also "Socializing a Social Robot with an

37

Page 38: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

Artificial Society" by Erin Kennedy. It’s im-portant to note that by "socializing" I don’tjust mean "teaching the AIs to behave appro-priately" but also "instilling in them the val-ues of their society, such that they care aboutthose values even when not being controlled."All of these approaches need to be built

in as the AI is being developed and whileit’s still below a human level of intelligence.Trying to train a human or especially super-human AI might meet with either active re-sistance or feigned cooperation until the AIbecomes powerful enough to break loose. Ofcourse, there may be designs such that an AIwould actively welcome taking on new valuesfrom humans, but this wouldn’t be true by de-fault (Armstrong, Soares, Fallenstein, & Yud-kowsky, 2015).When Bill Hibbard proposed building an AI

with a goal to increase happy human faces,Yudkowsky (2011) replied that such an AIwould "tile the future light-cone of Earth withtiny molecular smiley-faces." But obviously wewouldn’t have the AI aim just for smiley faces.In general, we get absurdities when we hyper-optimize for a single, shallow metric. Rather,the AI would use smiley faces (and lots ofother training signals) to develop a robust,compressed model that explains why humanssmile in various circumstances and then op-timize for that model, or maybe the ensem-ble of a large, diverse collection of such mod-els. In the limit of huge amounts of trainingdata and a sufficiently elaborate model space,these models should approach psychologicaland neuroscientific accounts of human emotionand cognition.The problem with stories in which AIs de-

stroy the world due to myopic utility functionsis that they assume that the AIs are alreadysuperintelligent when we begin to give themvalues. Sure, if you take a super-human in-telligence and tell it to maximize smiley-face

images, it’ll run away and do that before youhave a chance to refine your optimization met-ric. But if we build in values from the very be-ginning, even when the AIs are as rudimentaryas what we see today, we can improve the AIs’values in tandem with their intelligence. In-deed, intelligence could mainly serve the pur-pose of helping the AIs figure out how to betterfulfill moral values, rather than, say, predictingimages just for commercial purposes or iden-tifying combatants just for military purposes.Actually, the commercial and military objec-tives for which AIs are built are themselvesmoral values of a certain kind – just not thekind that most people would like to optimizefor in a global sense.

If toddlers had superpowers, it would bevery dangerous to try and teach them rightfrom wrong. But toddlers don’t, and neitherdo many simple AIs. Of course, simple AIshave some abilities far beyond anything hu-mans can do (e.g., arithmetic and data min-ing), but they don’t have the general intelli-gence needed to take matters into their ownhands before we can possibly give them atleast a basic moral framework. (Whether AIswill actually be given such a moral frameworkin practice is another matter.)

AIs are not genies granting three wishes. Ge-nies are magical entities whose inner workingsare mysterious. AIs are systems that we build,painstakingly, piece by piece. In order to builda genie, you need to have a pretty darn goodidea of how it behaves. Now, of course, sys-tems can be more complex than we realize.Even beginner programmers see how often thecode they write does something other thanwhat they intended. But these are typicallymistakes in a one or a small number of in-cremental changes, whereas building a genierequires vast numbers of steps. Systemic bugsthat aren’t realized until years later (on theorder of Heartbleed and Shellshock) may be

6John Kubiatowicz notes that space-shuttle software is some of the best tested and yet still has some bugs.

38

Page 39: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

more likely sources of long-run unintentionalAI behaviors?6

The picture I’ve painted here could bewrong. I could be overlooking crucial points,and perhaps there are many areas in whichthe socialization approach could fail. For ex-ample, maybe AI capabilities are much eas-ier than AI ethics, such that a toddler AI canfoom into a superhuman AI before we havetime to finish loading moral values. It’s goodfor others to probe these possibilities further.I just wouldn’t necessarily say that the defaultoutcome of AI research is likely to be a paper-clip maximizer. (I used to think the most likelyoutcome was a paperclip maximizer, and per-haps my views will shift again in the future.)This discussion also suggests some interest-

ing research questions, like

• How much of human morality is learnedvs. innate?

• By what cognitive mechanisms are younghumans socialized into the norms of a so-ciety?

• To what extent would models of humanemotion and reasoning, when put into AIs,organically generate human-like moral be-havior?

28.1 Treacherous turn

One problem with the proposals above is thattoy-model or "sandbox" environments are notby themselves sufficient to verify friendlinessof an AI, because even unfriendly AIs wouldbe motivated to feign good behavior until re-leased if they were smart enough to do so.Bostrom calls this the "treacherous turn" (pp.116-119 of Superintelligence). For this reason,white-box understanding of AI design wouldalso be important. That said, sandboxes wouldverify friendliness in AIs below human intel-ligence, and if the core value-learning algo-rithms seem well understood, it may not betoo much of a leap of faith to hope they carryforward reasonably to more intelligent agents.

Of course, non-human animals are also capa-ble of deception, and one can imagine AI archi-tectures even with low levels of sophisticationthat are designed to conceal their true goals.Some malicious software already does this. It’sunclear how likely an AI is to stumble uponthe ability to successfully fake its goals beforereaching human intelligence, or how like it isthat an organization would deliberately buildan AI this way.I think the treacherous turn may be the sin-

gle biggest challenge to mainstream machineethics, because even if AI takes off slowly, re-searchers will find it difficult to tell if a sys-tem has taken a treacherous turn. The turncould happen with a relatively small update tothe system, or even just after the system hasthought about its situation for enough time(or has read this essay).Here’s one half-baked idea for addressing the

treacherous turn. If researchers developed sev-eral different AIs systems with different de-signs but roughly comparable performance,some would likely go treacherous at differenttimes than others (if at all). Hence, the non-treacherous AIs could help sniff out the treach-erous ones. Assuming a solid majority of AIsremains non-treacherous at any given time,the majority vote could ferret out the traitors.In practice, I have low hopes for this approachbecause

• It would be extremely difficult to buildmany independent AI systems at oncewith none pulling too far ahead.

• Probably some systems would excel alongcertain dimensions, while others would ex-cel in other ways, and it’s not clear that iteven makes sense to talk about such AIsas "being at roughly the same level", sinceintelligence is not unidimensional.

• Even if this idea were feasible, I doubtthe first AI developers would incur the ex-pense of following it.

It’s more plausible that software tools and

39

Page 40: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

rudimentary alert systems (rather than full-blown alternate AIs) could help monitor forsigns of treachery, but it’s unclear how effec-tive they could be. One of the first prioritiesof a treacherous AI would be to figure out howto hide its treacherous subroutines from what-ever monitoring systems were in place.

28.2 Following role models?

Ernest Davis (2015) proposes the followingcrude principle for AI safety:

You specify a collection of admirable peo-ple, now dead. (Dead, because otherwiseBostrom will predict that the AI will ma-nipulate the preferences of the living peo-ple.) The AI, of course knows all aboutthem because it has read all their biogra-phies on the web. You then instruct theAI, “Don’t do anything that these peoplewould have mostly seriously disapprovedof.”

This particular rule might lead to paralysis,since every action an agent takes leads to re-sults that many people seriously disapprove of.For instance, given the vastness of the multi-verse, any action you take implies that a copyof you in an alternate (though low-measure)universe taking the same action causes the tor-ture of vast numbers of people. But perhapsthis problem could be fixed by asking the AIto maximize net approval by its role models.Another problem lies in defining "approval"

in a rigorous way. Maybe the AI wouldconstruct digital models of the past peo-ple, present them with various proposals, andmake its judgments based on their verbal re-ports. Perhaps the people could rate proposedAI actions on a scale of -100 to 100. This mightwork, but it doesn’t seem terribly safe either.For instance, the AI might threaten to kill allthe descendents of the historical people unlessthey give maximal approval to some arbitraryproposal that it has made. Since these digital

models of historical figures would be basicallyhuman, they would still be vulnerable to ex-tortion.Suppose that instead we instruct the AI

to take the action that, if the historical fig-ure saw it, would most activate a region ofhis/her brain associated with positive moralfeelings. Again, this might work if the rele-vant brain region was precisely enough spec-ified. But it could also easily lead to unpre-dictable results. For instance, maybe the AIcould present stimuli that would induce anepileptic seizure to maximally stimulate var-ious parts of the brain, including the moral-approval region. There are many other scenar-ios like this, most of which we can’t anticipate.So while Davis’s proposal is a valiant first

step, I’m doubtful that it would work off theshelf. Slow AI development, allowing for re-peated iteration on machine-ethics designs,seems crucial for AI safety.

29 AI superpowers?

In Superintelligence (Table 8, p. 94), Bostromoutlines several areas in which a hypotheti-cal superintelligence would far exceed humanability. In his discussion of oracles, genies, andother kinds of AIs (Ch. 10), Bostrom again ide-alizes superintelligences as God-like agents. Iagree that God-like AIs will probably emergeeventually, perhaps millennia from now as aresult of astroengineering. But I think they’lltake time even after AI exceeds human intel-ligence.Bostrom’s discussion has the air of mathe-

matical idealization more than practical engi-neering. For instance, he imagines that a ge-nie AI perhaps wouldn’t need to ask humansfor their commands because it could simplypredict them (p. 149), or that an oracle AImight be able to output the source code for agenie (p. 150). Bostrom’s observations resem-ble crude proofs establishing the equal powerof different kinds of AIs, analogous to theo-

40

Page 41: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

rems about the equivalency of single-tape andmulti-tape Turing machines. But Bostrom’stheorizing ignores computational complexity,which would likely be immense for the kindsof God-like feats that he’s imagining of hissuperintelligences. I don’t know the compu-tational complexity of God-like powers, but Isuspect they could be bigger than Bostrom’svision implies. Along this dimension at least,I sympathize with Tom Chivers, who felt thatBostrom’s book "has, in places, the air of the-ology: great edifices of theory built on a tinyfoundation of data."I find that I enter a different mindset when

pondering pure mathematics compared withcogitating on more practical scenarios. Math-ematics is closer to fiction, because you can de-fine into existence any coherent structure andplay around with it using any operation youlike no matter its computational complexity.Heck, you can even, say, take the supremum ofan uncountably infinite set. It can be temptingafter a while to forget that these structures aremere fantasies and treat them a bit too liter-ally. While Bostrom’s gods are not obviouslyonly fantasies, it would take a lot more workto argue for their realism. MIRI and FHI fo-cus on recruiting mathematical and philosoph-ical talent, but I think they would do well alsoto bring engineers into the mix, because it’sall too easy to develop elaborate mathemati-cal theories around imaginary entities.

30 How big would a superintelligencebe?

To get some grounding on this question, con-sider a single brain emulation. Bostrom esti-mates that running an upload would require atleast one of the fastest supercomputers by to-day’s standards. Assume the emulation wouldthink thousands to millions of times fasterthan a biological brain. Then to significantlyoutpace 7 billion humans (or, say, only themost educated 1 billion humans), we would

need at least thousands to millions of uploads.These numbers might be a few orders of mag-nitude lower if the uploads are copied froma really smart person and are thinking aboutrelevant questions with more focus than mosthumans. Also, Moore’s law may continue toshrink computers by several orders of magni-tude. Still, we might need at least the equiva-lent size of several of today’s supercomputersto run an emulation-based AI that substan-tially competes with the human race.Maybe a de novo AI could be significantly

smaller if it’s vastly more efficient than a hu-man brain. Of course, it might also be vastlylarger because it hasn’t had millions of yearsof evolution to optimize its efficiency.In discussing AI boxing (Ch. 9), Bostrom

suggests, among other things, keeping an AIin a Faraday cage. Once the AI became su-perintelligent, though, this would need to be apretty big cage.

31 Another hypothetical AI takeoffscenario

Inspired by the preceding discussion of social-izing AIs, here’s another scenario in which gen-eral AI follows more straightforwardly fromthe kind of weak AI used in Silicon Valley thanin the first scenario.• 2014: Weak AI is deployed by many tech-

nology companies for image classification,voice recognition, web search, consumerdata analytics, recommending Facebookposts, personal digital assistants (PDAs),and copious other forms of automation.There’s pressure to make AIs more in-sightful, including using deep neural net-works.

• 2024: Deep learning is widespread amongmajor tech companies. It allows for su-pervised learning with less manual featureengineering. Researchers develop more so-phisticated forms of deep learning thatcan model specific kinds of systems, in-

41

Page 42: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

cluding temporal dynamics. A goal isto improve generative modeling so thatlearning algorithms take input and notonly make immediate predictions but alsodevelop a probability distribution overwhat other sorts of things are happeningat the same time. For instance, a Googlesearch would not only return results butalso give Google a sense of the mood, per-sonality, and situation of the user whotyped it. Of course, even in 2014, we havethis in some form via Google PersonalizedSearch, but by 2024, the modeling will bemore "built in" to the learning architec-ture and less hand-crafted.

• 2035: PDAs using elaborate learned mod-els are now extremely accurate at predict-ing what their users want. The models inthese devices embody in crude form someof the same mechanisms as the user’s owncognitive processes. People become moretrusting of leaving their PDAs on autopi-lot to perform certain mundane tasks.

• 2065: A new generation of PDAs is nowsufficiently sophisticated that it has agood grasp of the user’s intentions. It canperform tasks as well as a human personalassistant in most cases – doing what theuser wanted because it has a strong pre-dictive model of the user’s personality andgoals. Meanwhile, researchers continue tounlock neural mechanisms of judgment,decision making, and value, which informthose who develop cutting-edge PDA ar-chitectures.

• 2095: PDAs are now essentially full-fledged copies of their owners. Some peo-ple have dozens of PDAs working forthem, as well as meta-PDAs who helpwith oversight. Some PDAs make disas-trous mistakes, and society debates howto construe legal accountability for PDAwrongdoing. Courts decide that ownersare responsible, which makes people morecautious, but given the immense competi-

tive pressure to outsource work to PDAs,the automation trend is not substantiallyaffected.

• 2110: The world moves too fast for bi-ological humans to participate. Most ofthe world is now run by PDAs, which –because they were built based on infer-ring the goals of their owners – protecttheir owners for the most part. However,there remains conflict among PDAs, andthe world is not a completely safe place.

• 2130: PDA-led countries create a worldgovernment to forestall costly wars. Thetransparency of digital society allows formore credible commitments and enforce-ment.

I don’t know what would happen with goalpreservation in this scenario. Would the PDAseventually decide to stop goal drift? Wouldthere be any gross and irrevocable failuresof translation between actual human valuesand what the PDAs infer? Would some peo-ple build "rogue PDAs" that operate undertheir own drives and that pose a threat to so-ciety? Obviously there are hundreds of waysthe scenario as I described it could be varied.

32 AI: More like the economy than likerobots?

What will AI look like over the next 30 years?I think it’ll be similar to the Internet revolu-tion or factory automation. Rather than de-veloping agent-like individuals with goal sys-tems, people will mostly optimize routine pro-cesses, developing ever more elaborate systemsfor mechanical tasks and information process-ing. The world will move very quickly – not be-cause AI "agents" are thinking at high speedsbut because software systems collectively willbe capable of amazing feats. Imagine, say, botsmaking edits on Wikipedia that become evermore sophisticated. AI, like the economy, willbe more of a network property than a local-ized, discrete actor.

42

Page 43: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

As more and more jobs become automated,more and more people will be needed to workon the automation itself: building, maintain-ing, and repairing complex software and hard-ware systems, as well as generating trainingdata on which to do machine learning. I ex-pect increasing automation in software main-tenance, including more robust systems andsystems that detect and try to fix errors.Present-day compilers that detect syntacti-cal problems in code offer a hint of what’spossible in this regard. I also expect increas-ingly high-level languages and interfaces forprogramming computer systems. Historicallywe’ve seen this trend – from assembly lan-guage, to C, to Python, to WYSIWIG edi-tors, to fully pre-built website styles, natural-language Google searches, and so on. Maybeeventually, as Marvin Minsky (1984) proposes,we’ll have systems that can infer our wishesfrom high-level gestures and examples. Thissuggestion is redolent of my PDA scenarioabove.In 100 years, there may be artificial human-

like agents, and at that point more sci-fi AI im-ages may become more relevant. But by thatpoint the world will be very different, and I’mnot sure the agents created will be discrete inthe way humans are. Maybe we’ll instead havea kind of global brain in which processes aremuch more intimately interconnected, trans-ferable, and transparent than humans are to-day. Maybe there will never be a distinct AGIagent on a single supercomputer; maybe su-perhuman intelligence will always be a dis-tributed system across many interacting com-puter systems. Robin Hanson gives an analogyin "I Still Don’t Get Foom":

Imagine in the year 1000 you didn’t un-derstand "industry," but knew it was com-ing, would be powerful, and involved ironand coal. You might then have pictureda blacksmith inventing and then forginghimself an industry, and standing in a

city square waiving it about, command-ing all to bow down before his terribleweapon. Today you can see this is silly —industry sits in thousands of places, mustbe wielded by thousands of people, andneeded thousands of inventions to make itwork.

Similarly, while you might imagine some-day standing in awe in front of a superintelligence that embodies all the power ofa new age, superintelligence just isn’t thesort of thing that one project could invent.As "intelligence" is just the name we giveto being better at many mental tasks byusing many good mental modules, there’sno one place to improve it.

Of course, this doesn’t imply that humans willmaintain the reins of control. Even today andthroughout history, economic growth has hada life of its own. Technological development isoften unstoppable even in the face of collectiveefforts of humanity to restrain it (e.g., nuclearweapons). In that sense, we’re already famil-iar with humans being overpowered by forcesbeyond their control. An AI takeoff will rep-resent an acceleration of this trend, but it’sunclear whether the dynamic will be funda-mentally discontinuous from what we’ve seenso far.Gregory Stock’s (1993) Metaman:

While many people have had ideas about aglobal brain, they have tended to supposethat this can be improved or altered byhumans according to their will. Metamancan be seen as a development that directshumanity’s will to its own ends, whetherit likes it or not, through the operation ofmarket forces.

Vernor Vinge reported that Metaman helpedhim see how a singularity might not be com-pletely opaque to us. Indeed, a superintelli-gence might look something like present-day

43

Page 44: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

human society, with leaders at the top: "Thatapex agent itself might not appear to be muchdeeper than a human, but the overall organiza-tion that it is coordinating would be more cre-ative and competent than a human."Update,Nov. 2015 : I’m increasingly leaning towardthe view that the development of AI over thecoming century will be slow, incremental, andmore like the Internet than like unified ar-tificial agents. I think humans will developvastly more powerful software tools long beforehighly competent autonomous agents emerge,since common-sense autonomous behavior isjust so much harder to create than domain-specific tools. If this view is right, it suggeststhat work on AGI issues may be somewhat lessimportant than I had thought, since1. AGI is very far away and2. the "unified agent" models of AGI that

MIRI tends to play with might be some-what inaccurate even once true AGIemerges.

This is a weaker form of the standard argu-ment that "we should wait until we know morewhat AGI will look like to focus on the prob-lem" and that"worrying about the dark sideof artificial intelligence is like worrying aboutoverpopulation on Mars".I don’t think the argument against focusing

on AGI works because1. some MIRI research, like on decision the-

ory, is "timeless" (pun intended) and canbe fruitfully started now

2. beginning the discussion early is impor-tant for ensuring that safety issues willbe explored when the field is more ma-ture

3. I might be wrong about slow takeoff,in which case MIRI-style work would bemore important.

Still, this point does cast doubt on heuris-tics like "directly shaping AGI dominates allother considerations." It also means that alot of the ways "AI safety" will play out on

shorter timescales will be with issues like as-sassination drones, computer security, finan-cial meltdowns, and other more mundane,catastrophic-but-not-extinction-level events.

33 Importance of whole-brain emula-tion

I don’t currently know enough about thetechnological details of whole-brain emulationto competently assess predictions that havebeen made about its arrival dates. In gen-eral, I think prediction dates are too opti-mistic (planning fallacy), but it still couldbe that human-level emulation comes beforefrom-scratch human-level AIs do. Of course,perhaps there would be some mix of both tech-nologies. For instance, if crude brain emula-tions didn’t reproduce all the functionality ofactual human brains due to neglecting somecellular and molecular details, perhaps from-scratch AI techniques could help fill in thegaps.If emulations are likely to come first, they

may deserve more attention than other formsof AI. In the long run, bottom-up AI will dom-inate everything else, because human brains– even run at high speeds – are only sosmart. But a society of brain emulations wouldrun vastly faster than what biological humanscould keep up with, so the details of shapingAI would be left up to them, and our main in-fluence would come through shaping the em-ulations. Our influence on emulations couldmatter a lot, not only in nudging the dynam-ics of how emulations take off but also becausethe values of the emulation society might de-pend significantly on who was chosen to beuploaded.One argument why emulations might im-

prove human ability to control AI is thatboth emulations and the AIs they would cre-ate would be digital minds, so the emulations’AI creations wouldn’t have inherent speed ad-vantages purely due to the greater efficiency

44

Page 45: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

of digital computation. Emulations’ AI cre-ations might still have more efficient mind ar-chitectures or better learning algorithms, butbuilding those would take work. The "for free"speedup to AIs just because of their substratewould not give AIs a net advantage over em-ulations. Bostrom feels "This consideration isnot too weighty" (p. 244 of Superintelligence)because emulations might still be far less intel-ligent than AGI. I find this claim strange, sinceit seems to me that the main advantage ofAGI in the short run would be its speed ratherthan qualitative intelligence, which would take(subjective) time and effort to develop.Bostrom also claims that if emulations come

first, we would face risks from two transitions(humans to emulations, and emulations to AI)rather than one (humans to AI). There may besome validity to this, but it also seems to ne-glect the realization that the "AI" transitionhas many stages, and it’s possible that emu-lation development would overlap with someof those stages. For instance, suppose the AItrajectory moves from AI1 → AI2 → AI3. Ifemulations are as fast and smart as AI1, thenthe transition to AI1 is not a major risk foremulations, while it would be a big risk forhumans. This is the same point as made inthe previous paragraph."Emulation timelines and AI risk" has fur-

ther discussion of the interaction between em-ulations and control of AIs.

34 Why work against brain-emulationrisks appeals to suffering reducers

Previously in this piece I compared the ex-pected suffering that would result from a rogueAI vs. a human-inspired AI. I suggested thatwhile a first-guess calculation may tip in fa-vor of a human-inspired AI on balance, thisconclusion is not clear and could change withfurther information, especially if we had rea-son to think that many rogue AIs would be"minimizers" of something or would not colo-

nize space.In the case of brain emulations (and other

highly neuromorphic AIs), we already know alot about what those agents would look like:They would have both maximization and min-imization goals, would usually want to colonizespace, and might have some human-type moralsympathies (depending on their edit distancerelative to a pure brain upload). The possi-bilities of pure-minimizer emulations or emu-lations that don’t want to colonize space aremostly ruled out. As a result, it’s pretty clearthat "unsafe" brain emulations and emulationarms-race dynamics would result in more ex-pected suffering than a more deliberative fu-ture trajectory in which altruists have a big-ger influence, even if those altruists don’t placeparticular importance on reducing suffering.Thus, the types of interventions that pure

suffering reducers would advocate with re-spect to brain emulations might largely matchthose that altruists who care about other val-ues would advocate. This means that gettingmore people interested in making the brain-emulation transition safer and more humaneseems like a safe bet for suffering reducers.One might wonder whether "unsafe" brain

emulations would be more likely to producerogue AIs, but this doesn’t seem to be thecase, because even unfriendly brain emula-tions would collectively be amazingly smartand would want to preserve their own goals.Hence they would place as much emphasis oncontrolling their AIs as would a more human-friendly emulation world. A main exception tothis is that a more cooperative, unified emu-lation world might be less likely to producerogue AIs because of less pressure for armsraces.

35 Would emulation work accelerateneuromorphic AI?

In Ch. 2 of Superintelligence, Bostrom makesa convincing case against brain-computer in-

45

Page 46: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

terfaces as an easy route to significantly super-human performance. One of his points is thatit’s very hard to decode neural signals in onebrain and reinterpret them in software or inanother brain (pp. 46-47). This might be anAI-complete problem.But then in Ch. 11, Bostrom goes on to sug-

gest that emulations might learn to decomposethemselves into different modules that couldbe interfaced together (p. 172). While possi-ble in principle, I find such a scenario implau-sible for the reason Bostrom outlined in Ch. 2:There would be so many neural signals to hookup to the right places, which would be differ-ent across different brains, that the task seemshopelessly complicated to me. Much easier tobuild something from scratch.Along the same lines, I doubt that brain em-

ulation in itself would vastly accelerate neuro-morphic AI, because emulation work is mostlyabout copying without insight. Cognitive psy-chology is often more informative about AIarchitectures than cellular neuroscience, be-cause general psychological systems can beunderstood in functional terms as inspirationfor AI designs, compared with the opacityof neuronal spaghetti. In Bostrom’s list ofexamples of AI techniques inspired by biol-ogy (Ch. 14, "Technology couplings"), only afew came from neuroscience specifically. Thatsaid, emulation work might involve some cross-pollination with AI, and in any case, it mightaccelerate interest in brain/artificial intelli-gence more generally or might put pressure onAI groups to move ahead faster. Or it couldfunnel resources and scientists away from denovo AI work. The upshot isn’t obvious.A "Singularity Summit 2011 Workshop Re-

port" includes the argument that neuromor-phic AI should be easier than brain emula-tion because "Merely reverse-engineering theMicrosoft Windows code base is hard, soreverse-engineering the brain is probably muchharder" (Salamon & Muehlhauser, 2012). Butemulation is not reverse-engineering. As Robin

Hanson (1994) explains, brain emulation ismore akin to porting software (though prob-ably "emulation" actually is the more preciseword, since emulation involves simulating theoriginal hardware). While I don’t know anyfully reverse-engineered versions of Windows,there are several Windows emulators, such asVirtualBox.Of course, if emulations emerged, their sig-

nificantly faster rates of thinking would multi-ply progress on non-emulation AGI by ordersof magnitude. Getting safe emulations doesn’tby itself get safe de novo AGI because theproblem is just pushed a step back, but wecould leave AGI work up to the vastly fasteremulations. Thus, for biological humans, if em-ulations come first, then influencing their de-velopment is the last thing we ever need to do.That said, thinking several steps ahead aboutwhat kinds of AGIs emulations are likely toproduce is an essential part of influencing em-ulation development in better directions.

36 Are neuromorphic or mathematicalAIs more controllable?

Arguments for mathematical AIs:• Behavior and goals are more transparent,

and goal preservation seems easier to spec-ify (see "The Ethics of Artificial Intel-ligence" by Bostrom and Yudkowsky, p.16).

• Neuromorphic AIs might speed up mathe-matical AI, leaving less time to figure outcontrol.

Arguments for neuromorphic AIs:• We understand human psychology, expec-

tations, norms, and patterns of behavior.Mathematical AIs could be totally alienand hence unpredictable.

• If neuromorphic AIs came first, they couldthink faster and help figure out goalpreservation, which I assume does requiremathematical AIs at the end of the day.

• Mathematical AIs may be more prone to

46

Page 47: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

unexpected breakthroughs that yield rad-ical jumps in intelligence.

In the limit of very human-like neuromorphicAIs, we face similar considerations as betweenemulations vs. from-scratch AIs – a tradeoffwhich is not at all obvious.Overall, I think mathematical AI has a bet-

ter best case but also a worse worst case thanneuromorphic. If you really want goal preser-vation and think goal drift would make the fu-ture worthless, you might lean more towardsmathematical AI because it’s more likely toperfect goal preservation. But I probably careless about goal preservation and more aboutavoiding terrible outcomes.In Superintelligence (Ch. 14), Bostrom

comes down strongly in favor of mathemati-cal AI being safer. I’m puzzled by his high de-gree of confidence here. Bostrom claims thatunlike emulations, neuromorphic AIs wouldn’thave human motivations by default. But thisseems to depend on how human motivationsare encoded and what parts of human brainsare modeled in the AIs.In contrast to Bostrom, a 2011 Singular-

ity Summit workshop ranked neuromorphicAI as more controllable than (non-friendly)mathematical AI, though of course they foundfriendly mathematical AI most controllable(Salamon & Muehlhauser, 2012). The work-shop’s aggregated probability of a good out-come given brain emulation or neuromorphicAI turned out to be the same (14%) as thatfor mathematical AI (which might be eitherfriendly or unfriendly).

37 Impacts of empathy for AIs

As I noted above, advanced AIs will be com-plex agents with their own goals and values,and these will matter ethically. Parallel to dis-cussions of robot rebellion in science fictionare discussions of robot rights. I think evenpresent-day computers deserve a tiny bit ofmoral concern, and complex computers of the

future will command even more ethical con-sideration.How might ethical concern for machines in-

teract with control measures for machines?

37.1 Slower AGI development?

As more people grant moral status to AIs,there will likely be more scrutiny of AI re-search, analogous to how animal activists inthe present monitor animal testing. This maymake AI research slightly more difficult andmay distort what kinds of AIs are built de-pending on the degree of empathy people havefor different types of AIs (Calverley, 2005).For instance, if few people care about invisible,non-embodied systems, researchers who buildthese will face less opposition than those whopioneer suffering robots or animated charac-ters that arouse greater empathy. If this possi-bility materializes, it would contradict presenttrends where it’s often helpful to create at leasta toy robot or animated interface in order to"sell" your research to grant-makers and thepublic.Since it seems likely that reducing the pace

of progress toward AGI is on balance bene-ficial, a slowdown due to ethical constraintsmay be welcome. Of course, depending on thedetails, the effect could be harmful. For in-stance, perhaps China wouldn’t have manyethical constraints, so ethical restrictions inthe West might slightly favor AGI develop-ment by China and other less democraticcountries. (This is not guaranteed. For whatit’s worth, China has already made strides to-ward reducing animal testing.)In any case, I expect ethical restrictions on

AI development to be small or nonexistent un-til many decades from now when AIs developperhaps mammal-level intelligence. So maybesuch restrictions won’t have a big impact onAGI progress. Moreover, it may be that mostAGIs will be sufficiently alien that they won’tarouse much human sympathy.Brain emulations seem more likely to raise

47

Page 48: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

ethical debate because it’s much easier to ar-gue for their personhood. If we think brainemulation coming before AGI is good, a slow-down of emulations could be unfortunate,while if we want AGI to come first, a slow-down of emulations should be encouraged.Of course, emulations and AGIs do actually

matter and deserve rights in principle. More-over, movements to extend rights to machinesin the near term may have long-term impactson how much post-humans care about suffer-ing subroutines run at galactic scale. I’m justpointing out here that ethical concern for AGIsand emulations also may somewhat affect tim-ing of these technologies.

37.2 Attitudes toward AGI control

Most humans have no qualms about shuttingdown and rewriting programs that don’t workas intended, but many do strongly object tokilling people with disabilities and designingbetter-performing babies. Where to draw aline between these cases is a tough question,but as AGIs become more animal-like, theremay be increasing moral outrage at shuttingthem down and tinkering with them willy nilly.Nikola Danaylov asked Roman Yampolskiy

whether it was speciesist or discrimination infavor of biological beings to lock up machinesand observe them to ensure their safety beforeletting them loose.At a lecture in Berkeley, CA, Nick Bostrom

was asked whether it’s unethical to "chain"AIs by forcing them to have the values wewant. Bostrom replied that we have to givemachines some values, so they may as wellalign with ours. I suspect most people wouldagree with this, but the question becomestrickier when we consider turning off erro-neous AGIs that we’ve already created be-cause they don’t behave how we want them to.A few hard-core AGI-rights advocates mightraise concerns here. More generally, there’s asegment of transhumanists (including youngEliezer Yudkowsky) who feel that human con-

cerns are overly parochial and that it’s chua-vanist to impose our "monkey dreams" on anAGI, which is the next stage of evolution.The question is similar to whether one sym-

pathizes with the Native Americans (humans)or their European conquerors (rogue AGIs).Before the second half of the 20th century,many history books glorified the winners (Eu-ropeans). After a brief period in which humansare quashed by a rogue AGI, its own "his-tory books" will celebrate its conquest and thebending of the arc of history toward "higher","better" forms of intelligence. (In practice, thepsychology of a rogue AGI probably wouldn’tbe sufficiently similar to human psychology forthese statements to apply literally, but theywould be true in a metaphorical and implicitsense.)David Althaus worries that if people sym-

pathize too much with machines, society willbe less afraid of an AI takeover, even if AItakeover is bad on purely altruistic grounds.I’m less concerned about this because even ifpeople agree that advanced machines are sen-tient, they would still find it intolerable forAGIs to commit speciecide against humanity.Everyone agrees that Hitler was sentient, afterall. Also, if it turns out that rogue-AI takeoveris altruistically desirable, it would be better ifmore people agreed with this, though I expectan extremely tiny fraction of the populationwould ever come around to such a position.Where sympathy for AGIs might have more

impact is in cases of softer takeoff where AGIswork in the human economy and acquire in-creasing shares of wealth. The more humanscare about AGIs for their own sakes, the moresuch transitions might be tolerated. Or wouldthey? Maybe seeing AGIs as more human-likewould evoke the xenophobia and ethnic hatredthat we’ve seen throughout history whenever agroup of people gains wealth (e.g., Jews in Me-dieval Europe) or steals jobs (e.g., immigrantsof various types throughout history).Personally, I think greater sympathy for AGI

48

Page 49: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

is likely net positive because it may help allayanti-alien prejudices that may make coopera-tion with AGIs harder. When a Homo sapienstribe confronts an outgroup, often it reacts vi-olently in an effort to destroy the evil foreign-ers. If instead humans could cooperate withtheir emerging AGI brethren, better outcomeswould likely follow.

38 Charities working on this issue

What are some places where donors can con-tribute to make a difference on AI? The Centeron Long-Term Risk (CLR) explores questionslike these, though at the moment the organi-zation is rather small. MIRI is larger and hasa longer track record. Its values are more con-ventional, but it recognizes the importance ofpositive-sum opportunities to help many val-ues systems, which includes suffering reduc-tion. More reflection on these topics can po-tentially reduce suffering and further goals likeeudaimonia, fun, and interesting complexity atthe same time.Because AI is affected by many sectors of

society, these problems can be tackled fromdiverse angles. Many groups besides FRI andMIRI examine important topics as well, andthese organizations should be explored furtheras potential charity recommendations.

39 Is MIRI’s work too theoretical?

Most of MIRI’s publications since roughly2012 have focused on formal mathematics,such as logic and provability. These are toolsnot normally used in AGI research. I thinkMIRI’s motivations for this theoretical focusare

1. Pessimism about the problem difficulty:Luke Muehlhauser writes that "Espe-cially for something as complex asFriendly AI, our message is: ’If we proveit correct, itmight work. If we don’t proveit correct, it definitely won’t work.’"

2. Not speeding unsafe AGI: Building real-world systems would contribute towardnon-safe AGI research.

3. Long-term focus: MIRI doesn’t just wanta system that’s the next level better butaims to explore the theoretical limits ofpossibilities.

I personally think reason #3 is most com-pelling. I doubt #2 is hugely important givenMIRI’s small size, though it matters to somedegree. #1 seems a reasonable strategy inmoderation, though I favor approaches thatlook decently likely to yield non-terrible out-comes rather than shooting for the absolutebest outcomes.Software can be proved correct, and some-

times this is done for mission-critical compo-nents, but most software is not validated. Isuspect that AGI will be sufficiently big andcomplicated that proving safety will be impos-sible for humans to do completely, though Idon’t rule out the possibility of software thatwould help with correctness proofs on largesystems. Muehlhauser and comments on hispost largely agree with this.What kind of track record does theoretical

mathematical research have for practical im-pact? There are certainly several domains thatcome to mind, such as the following.

• Auction game theory has made govern-ments billions of dollars and is widely usedin Internet advertising.

• Theoretical physics has led to numerousforms of technology, including electricity,lasers, and atomic bombs. However, im-mediate technological implications of themost theoretical forms of physics (stringtheory, Higgs boson, black holes, etc.) areless pronounced.

• Formalizations of many areas of computerscience have helped guide practical im-plementations, such as in algorithm com-plexity, concurrency, distributed systems,cryptography, hardware verification, and

49

Page 50: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

so on. That said, there are also areas oftheoretical computer science that have lit-tle immediate application. Most softwareengineers only know a little bit aboutmore abstract theory and still do finebuilding systems, although if no one knewtheory well enough to design theory-basedtools, the software field would be in con-siderably worse shape.

All told, I think it’s important for someone todo the kinds of investigation that MIRI is un-dertaking. I personally would probably investmore resources than MIRI is in hacky, approx-imate solutions to AGI safety that don’t makesuch strong assumptions about the theoreti-cal cleanliness and soundness of the agents inquestion. But I expect this kind of less per-fectionist work on AGI control will increase asmore people become interested in AGI safety.There does seem to be a significant divide

between the math-oriented conception of AGIand the engineering/neuroscience conception.Ben Goertzel takes the latter stance:

I strongly suspect that to achieve high lev-els of general intelligence using realisti-cally limited computational resources, oneis going to need to build systems witha nontrivial degree offundamental unpre-dictabilityto them. This is what neuro-science suggests, it’s what my concreteAGI design work suggests, and it’s whatmy theoretical work on GOLEM and re-lated ideas suggests (Goertzel, 2014). Andnone of the public output of SIAI re-searchers or enthusiasts has given me anyreason to believe otherwise, yet.

Personally I think Goertzel is more likely tobe right on this particular question. Thosewho view AGI as fundamentally complex havemore concrete results to show, and their ap-proach is by far more mainstream among com-puter scientists and neuroscientists. Of course,proofs about theoretical models like Turing

machines and lambda calculus are also main-stream, and few can dispute their importance.But Turing-machine theorems do little to con-strain our understanding of what AGI will ac-tually look like in the next few centuries. Thatsaid, there’s significant peer disagreement onthis topic, so epistemic modesty is warranted.In addition, if the MIRI view is right, wemight have more scope to make an impactto AGI safety, and it would be possible thatimportant discoveries could result from a fewmathematical insights rather than lots of de-tailed engineering work. Also, most AGI re-search is more engineering-oriented, so MIRI’sdistinctive focus on theory, especially abstracttopics like decision theory, may target an un-derfunded portion of the space of AGI-safetyresearch.

In "How to Study Unsafe AGI’s safely (andwhy we might have no choice)", Punoxysmmakes several points that I agree with, includ-ing that AGI research is likely to yield manyfalse starts before something self-sustainingtakes off, and those false starts could affordus the opportunity to learn about AGI ex-perimentally. Moreover, this kind of ad-hoc,empirical work may be necessary if, as seemsto me probable, fully rigorous mathematicalmodels of safety aren’t sufficiently advancedby the time AGI arrives.

Ben Goertzel likewise suggests that a fruit-ful way to approach AGI control is to studysmall systems and "in the usual manner of sci-ence, attempt to arrive at a solid theory of AGIintelligence and ethics based on a combina-tion of conceptual and experimental-data con-siderations". He considers this view the normamong "most AI researchers or futurists". Ithink empirical investigation of how AGIs be-have is very useful, but we also have to re-member that many AI scientists are overly bi-ased toward "build first; ask questions later"because

• building may be more fun and exciting

50

Page 51: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

than worrying about safety (Steven M.Bellovin observed with reference to open-source projects: "Quality takes work, de-sign, review and testing and those are notnearly as much fun as coding".)

• there’s more incentive from commercialapplications and government grants tobuild rather than introspect

• scientists may want AGI sooner so thatthey personally or their children can reapits benefits.

On a personal level, I suggest that if you re-ally like building systems rather than thinkingabout safety, you might do well to earn to givein software and donate toward AGI-safety or-ganizations.

40 Next steps

Here are some rough suggestions for how Irecommend proceeding on AGI issues and, in[brackets], roughly how long I expect eachstage to take. Of course, the stages needn’t bedone in a strict serial order, and step 1 shouldcontinue indefinitely, as we continue learningmore about AGI from subsequent steps.1. Decide if we want human-controlled, goal-

preserving AGI [5-10 years]. This in-volves exploring questions about whattypes of AGI scenarios might unfold andhow much suffering would result fromAGIs of various types.

2. Assuming we decide we do want con-trolled AGI: Network with academics andAGI developers to raise the topic and can-vass ideas [5-10 years]. We could reachout to academic AGI-like projects, in-cluding these listed by Pei Wang andthese listed on Wikipedia, as well as tomachine ethics and roboethics communi-ties. There are already some discussionsabout safety issues among these groups,but I would expand the dialogue, haveprivate conversations, write publications,hold conferences, etc. These efforts both

inform us about the lay of the land andbuild connections in a friendly, mutualis-tic way.

3. Lobby for greater funding of research intoAGI safety [10-20 years]. Once the ideaand field of AGI safety have become moremainstream, it should be possible to dif-ferentially speed up safety research bygetting more funding for it – both fromgovernments and philanthropists. This isalready somewhat feasible; for instance:"In 2014, the US Office of Naval Researchannounced that it would distribute $7.5million in grants over five years to uni-versity researchers to study questions ofmachine ethics as applied to autonomousrobots."

4. The movement snowballs [decades]. It’shard to plan this far ahead, but I imag-ine that eventually (within 25-50 years?)AGI safety will become a mainstream po-litical topic in a similar way as nuclearsecurity is today. Governments may takeover in driving the work, perhaps withheavy involvement from companies likeGoogle. This is just a prediction, and theactual way things unfold could be differ-ent.

I recommend avoiding a confrontational ap-proach with AGI developers. I would not tryto lobby for restrictions on their research (inthe short term at least), nor try to "slow themdown" in other ways. AGI developers are theallies we need most at this stage, and mostof them don’t want uncontrolled AGI either.Typically they just don’t see their work asrisky, and I agree that at this point, no AGIproject looks set to unveil something danger-ous in the next decade or two. For many re-searchers, AGI is a dream they can’t help butpursue. Hopefully we can engender a similarenthusiasm about pursuing AGI safety.

In the longer term, tides may change, andperhaps many AGI developers will desire

51

Page 52: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

government-imposed restrictions as their tech-nologies become increasingly powerful. Eventhen, I’m doubtful that governments will beable to completely control AGI development(see, e.g., the criticisms by John McGinnisof this approach), so differentially pushing formore safety work may continue to be the mostleveraged solution. History provides a poortrack record of governments refraining fromdeveloping technologies due to ethical con-cerns; (Eckersley & Sandberg, 2014, p. 187)(p. 187) cite "human cloning and land-basedautonomous robotic weapons" as two of thefew exceptions, with neither prohibition hav-ing a long track record.

I think the main way in which we should tryto affect the speed of regular AGI work is byaiming to avoid setting off an AGI arms race,either via an AGI Sputnik moment or else bymore gradual diffusion of alarm among worldmilitaries. It’s possible that discussing AGIscenarios too much with military leaders couldexacerbate a militarized reaction. If militariesset their sights on AGI the way the US andSoviet Union did on the space race or nuclear-arms race during the Cold War, the amountof funding for unsafe AGI research might mul-tiply by a factor of 10 or maybe 100, and itwould be aimed in harmful directions.

41 Where to push for maximal impact?

Here are some candidates for the best object-level projects that altruists could work on withreference to AI. Because AI seems so crucial,these are also candidates for the best object-level projects in general. Meta-level projectslike movement-building, career advice, earningto give, fundraising, etc. are also competitive.I’ve scored each project area out of 10 pointsto express a rough subjective guess of the valueof the work for suffering reducers.

Research whether controlled or uncon-trolled AI yields more suffering (score= 10/10)

Pros:• Figuring out which outcome is better

should come before pushing ahead too farin any particular direction.

• This question remains non-obvious and sohas very high expected value of informa-tion.

• None of the existing big names in AIsafety have explored this question becausereducing suffering is not the dominant pri-ority for them.

Cons:• None.

Push for suffering-focused AI-safety ap-proaches (score = 10/10)

Most discussions of AI safety assume that hu-man extinction and failure to spread (human-type) eudaimonia are the main costs oftakeover by uncontrolled AI. But as noted inthis piece, AIs would also spread astronomicalamounts of suffering. Currently no organiza-tion besides FRI is focused on how to do AIsafety work with the primary aim of avoidingoutcomes containing huge amounts of suffer-ing.One example of a suffering-focused AI-safety

approach is to design AIs so that even if theydo get out of control, they "fail safe" in thesense of not spreading massive amounts of suf-fering into the cosmos. For example:1. AIs should be inhibited from colonizing

space, or if they do colonize space, theyshould do so in less harmful ways.

2. "Minimizer" utility functions have lessrisk of creating new universes than "max-imizer" ones do.

3. Simpler utility functions (e.g., creatinguniform paperclips) might require fewer

52

Page 53: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

suffering subroutines than complex util-ity functions would.

4. AIs with expensive intrinsic values (e.g.,maximize paperclips) may run fewer com-plex minds than AIs with cheaper values(e.g., create at least one paperclip on eachplanet), because AIs with cheaper valueshave lower opportunity cost for using re-sources and so can expend more of theircosmic endowment on learning about theuniverse to make sure they’ve accom-plished their goals properly. (Thanks toa friend for this point.) From this stand-point, suffering reducers might prefer anAI that aims to "maximize paperclips"over one that aims to "make sure there’sat least one paperclip per planet." How-ever, perhaps the paperclip maximizerwould prefer to create new universes,while the "at least one paperclip perplanet" AI wouldn’t; indeed, the "one pa-perclip per planet" AI might prefer tohave a smaller multiverse so that therewould be fewer planets that don’t con-tain paperclips. Also, the satisficing AIwould be easier to compromise with thanthe maximizing AI, since the satisficer’sgoals could be carried out more cheaply.There are other possibilities to consideras well. Maybe an AI with the instruc-tions to "be 70% sure of having madeone paperclip and then shut down all ofyour space-colonization plans" would notcreate much suffering (depending on howscrupulous the AI was about making surethat what it had created was really a pa-perclip, that it understood physics prop-erly, etc.).

The problem with bullet #1 is that if you cansucceed in preventing AGIs from colonizingspace, it seems like you should already havebeen able to control the AGI altogether, sincethe two problems appear about equally hard.But maybe there are clever ideas we haven’t

thought of for reducing the spread of sufferingeven if humans lose total control.Another challenge is that those who don’t

place priority on reducing suffering may notagree with these proposals. For example, Iwould guess that most AI scientists would say,"If the AGI kills humans, at least we shouldensure that it spreads life into space, creates acomplex array of intricate structures, and in-creases the size of our multiverse."

Work on AI control and value-loadingproblems (score = 4/10)

Pros:• At present, controlled AI seems more

likely good than bad.• Relatively little work thus far, so marginal

effort may make a big impact.

Cons:• It may turn out that AI control increases

net expected suffering.• This topic may become a massive area of

investment in coming decades, because ev-eryone should theoretically care about it.Maybe there’s more leverage in pushingon neglected areas of particular concernfor suffering reduction.

Research technological/economic/ polit-ical dynamics of an AI takeoff and pushin better directions (score = 3/10)

By this I have in mind scenarios like thoseof Robin Hanson for emulation takeoff, orBostrom’s (2004) "The Future of Human Evo-lution".

Pros:• Many scenarios have not been mapped

out. There’s a need to introduce eco-nomic/social realism to AI scenarios,which at present often focus on technicalchallenges and idealized systems.

53

Page 54: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

• Potential to steer dynamics in more win-win directions.

Cons:

• Broad subject area. Work may be some-what replaceable as other researchers geton board in the coming decades.

• More people have their eyes on generaleconomic/social trends than on specificAI technicalities, so there may be lowermarginal returns to additional work inthis area.

• While technological progress is probablythe biggest influence on history, it’s alsoone of the more inevitable influences,making it unclear how much we can af-fect it. Our main impact on it would seemto come through differential technologi-cal progress. In contrast, values, institu-tions, and social movements can go inmany different directions depending onour choices.

Promote the ideal of cooperation on AIvalues (e.g., CEV by Yudkowsky (2004))(score = 2/10)

Pros:

• Whereas technical work on AI safety is ofinterest to and benefits everyone – includ-ing militaries and companies with non-altruistic aims – promoting CEV is moreimportant to altruists. I don’t see CEV asa likely outcome even if AI is controlled,because it’s more plausible that individ-uals and groups will push for their ownagendas.

Cons:

• It’s very hard to achieve CEV. It dependson a lot of really complex political andeconomic dynamics that millions of altru-ists are already working to improve.

• Promoting CEV as an ideal to approxi-mate may be confused in people’s mindswith suggesting that CEV is likely to hap-pen. The latter assumption is probablywrong and so may distort people’s be-liefs about other crucial questions. For in-stance, if CEV was likely, then it would bemore likely that suffering reducers shouldfavor controlled AI; but the fact of thematter is that anything more than crudeapproximations to CEV will probably nothappen.

Promote a smoother, safer takeoff forbrain emulation (score = 2/10)

Pros:• As noted above, it’s more plausible that

suffering reducers should favor emulationsafety than AI safety.

• The topic seems less explored than safetyof de novo AIs.

Cons:• I find it slightly more likely that de novo

AI will come first, in which case this workwouldn’t be as relevant. In addition, AImay have more impacts on society evenbefore it reaches the human level, againmaking it slightly more relevant.

• Safety measures might require more po-litical and less technical work, in whichcase it’s more likely to be done correctlyby policy makers in due time. The value-loading problem seems much easier foremulations because it might just work toupload people with good values, assumingno major value corruption during or afteruploading.

• Emulation is more dependent on rel-atively straightforward engineering im-provements and less on unpredictable in-sight than AI. Thus, it has a clearer devel-opment timeline, so there’s less urgency toinvestigate issues ahead of time to prepare

54

Page 55: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

for an unexpected breakthrough.

Influence the moral values of those likelyto control AI (score = 2/10)

Pros:• Altruists, and especially those with niche

values, may want to push AI develop-ment in more compassionate directions.This could make sense because altruistsare most interested in ethics, while evenpower-hungry states and money-hungryindividuals should care about AI safety inthe long run.

Cons:• This strategy is less cooperative. It’s akin

to defecting in a tragedy of the commons– pushing more for what you want ratherthan what everyone wants. If you do pushfor what everyone wants, then I wouldconsider such work more like the "Pro-mote the ideal of cooperation" item.

• Empirically, there isn’t enough investmentin other fundamental AI issues, and thosemay be more important than further en-gaging already well trodden ethical de-bates.

Promote a singleton over multipolar dy-namics (score = 1/10)

Pros:• A singleton, whether controlled or uncon-

trolled, would reduce the risk of conflictsthat cause cosmic damage.

Unclear:• There are many ways to promote a single-

ton. Encouraging cooperation on AI de-velopment would improve pluralism andhuman control in the outcome. Faster de-velopment by the leading AI project mightalso increase the chance of a singletonwhile reducing the probability of human

control of the outcome. Stronger govern-ment regulation, surveillance, and coordi-nation would increase chances of a single-ton, as would global cooperation.

Cons:• Speeding up the leading AI project might

exacerbate AI arms races. And in anyevent, it’s currently far too early to pre-dict what group will lead the AI race.

Other variations

In general, there are several levers that we canpull on:• safety• arrival time relative to other technologies• influencing values• cooperation• shaping social dynamics• raising awareness• etc.These can be applied to any of• de novo AI• brain emulation• other key technologies• etc.

42 Is it valuable to work at or influencean AGI company?

Projects like DeepMind, Vicarious, OpenCog,and the AGI research teams at Google, Face-book, etc. are some of the leaders in AGItechnology. Sometimes it’s proposed that sincethese teams might ultimately develop AGI, al-truists should consider working for, or at leastlobbying, these companies so that they thinkmore about AGI safety.One’s assessment of this proposal depends

on one’s view about AGI takeoff. My ownopinion may be somewhat in the minority rel-ative to expert surveys (Müller & Bostrom,2016), but I’d be surprised if we had human-level AGI before 50 years from now, and my

55

Page 56: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

median estimate might be like ∼ 90 years fromnow. That said, the idea of AGI arriving ata single point in time is probably a wrongframing of the question. Already machinesare super-human in some domains, while theirabilities are far below humans’ in other do-mains. Over the coming decades, we’ll see lotsof advancement in machine capabilities in var-ious fields at various speeds, without any sin-gle point where machines suddenly develophuman-level abilities across all domains. Grad-ual AI progress over the coming decades willradically transform society, resulting in manysmall "intelligence explosions" in various spe-cific areas, long before machines completelysurpass humans overall.

In light of my picture of AGI, I think ofDeepMind, Vicarious, etc. as ripples in a long-term wave of increasing machine capabilities.It seems extremely unlikely that any one ofthese companies or its AGI system will boot-strap itself to world dominance on its own.Therefore, I think influencing these companieswith an eye toward "shaping the AGI that willtake over the world" is probably naive. Thatsaid, insofar as these companies will influencethe long-term trajectory of AGI research, andinsofar as people at these companies are im-portant players in the AGI community, I thinkinfluencing them has value – just not vastlymore value than influencing other powerfulpeople.

That said, as noted previously, early workon AGI safety has the biggest payoff in sce-narios where AGI takes off earlier and harderthan people expected. If the marginal returnsto additional safety research are many timeshigher in these "early AGI" scenarios, then itcould still make sense to put some investmentinto them even if they seem very unlikely.

43 Should suffering reducers focus onAGI safety?

If, upon further analysis, it looks like AGIsafety would increase expected suffering, thenthe answer would be clear: Suffering reducersshouldn’t contribute toward AGI safety andshould worry somewhat about how their mes-sages might incline others in that direction.However, I find it reasonably likely that suf-fering reducers will conclude that the benefitsof AGI safety outweigh the risks. In that case,they would face a question of whether to pushon AGI safety or on other projects that alsoseem valuable.Reasons to focus on other projects:

• There are several really smart peopleworking on AGI safety right now. Thenumber of brilliant altruists focused onAGI safety probably exceeds the num-ber of brilliant altruists focused on re-ducing suffering in the far future by sev-eral times over. Thus, it seems plausiblethat there remain more low-hanging fruitfor suffering reducers to focus on othercrucial considerations rather than delvinginto the technical details of implementingAGI safety.

• I expect that AGI safety will require atleast, say, thousands of researchers andhundreds of thousands of programmers toget right. AGI safety is a much harderproblem than ordinary computer secu-rity, and computer security demand isalready very high: "In 2012, there weremore than 67,400 separate postings forcybersecurity-related jobs in a range ofindustries". Of course, that AGI safetywill need tons of researchers eventuallyneedn’t discount the value of early work,and indeed, someone who helps grow themovement to a large size would contributeas much as many detail-oriented AGIsafety researchers later.

Reasons to focus on AGI safety:

56

Page 57: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Artificial Intelligence and Its Implications for Future Suffering

• Most other major problems are also al-ready being tackled by lots of smart peo-ple.

• AGI safety is a cause that many value sys-tems can get behind, so working on it canbe seen as more "nice" than focusing onareas that are more specific to suffering-reduction values.

All told, I would probably pursue a mixedstrategy: Work primarily on questions specificto suffering reduction, but direct donationsand resources toward AGI safety when oppor-tunities arise. Some suffering reducers particu-larly suited to work on AGI safety could go inthat direction while others continue searchingfor points of leverage not specific to controllingAGI.

44 Acknowledgments

Parts of this piece were inspired by discussionswith various people, including David Althaus,Daniel Dewey, and Caspar Oesterheld.

References

Armstrong, S., Soares, N., Fallenstein, B., &Yudkowsky, E. (2015). Corrigibility. InAAAI Publications. Austin, TX, USA.

Bloom, P. (2013). Just babies: The origins ofgood and evil. New York: Crown.

Bostrom, N. (2003). Astronomical waste: Theopportunity cost of delayed technologi-cal development. Utilitas , 15 (03), 308–314.

Bostrom, N. (2004). The future of humanevolution. In C. Tandy (Ed.), Deathand Anti-Death: Two Hundred Years Af-ter Kant, Fifty Years After Turing (pp.339–371). Palo Alto, California: Ria Uni-versity Press.

Bostrom, N. (2006). What is a singleton? Lin-guistic and Philosophical Investigations ,5 (2), 48–54.

Bostrom, N. (2010). Anthropic bias: Observa-

tion selection effects in science and phi-losophy (1edition ed.). Abingdon, Oxon:Routledge.

Bostrom, N. (2014). Superintelligence: Paths,dangers, strategies. Oxford UniversityPress.

Bostrom, N., & Yudkowsky, E. (2014).The ethics of artificial intelligence. InK. Frankish & W. M. Ramsey (Eds.),The Cambridge Handbook of ArtificialIntelligence (pp. 316–334). CambridgeUniversity Press.

Brooks, F. P., Jr. (1995). The Mythical Man-month (Anniversary Ed.). Boston, MA,USA: Addison-Wesley Longman Pub-lishing Co., Inc.

Calverley, D. J. (2005). Android science andthe animal rights movement: are thereanalogies. In Cognitive sciences societyworkshop, Stresa, Italy (pp. 127–136).

Davis, E. (2015). Ethical guidelines for asuperintelligence. Artificial Intelligence,220 , 121–124.

Dennett, D. C. (1992). Consciousness Ex-plained (1st ed.). Boston: Back BayBooks.

Eckersley, P., & Sandberg, A. (2014). Is BrainEmulation Dangerous? Journal of Artifi-cial General Intelligence, 4 (3), 170–194.

Goertzel, B. (2014). GOLEM: to-wards an AGI meta-architecture en-abling both goal preservation and rad-ical self-improvement. Journal of Exper-imental & Theoretical Artificial Intelli-gence, 26 (3), 391–403.

Good, I. J. (1965). Speculations concerningthe first ultraintelligent machine. Ad-vances in computers , 6 , 31–88.

Good, I. J. (1982). Ethical machines. In TenthMachine Intelligence Workshop, Cleve-land, Ohio (Vol. 246, pp. 555–560).

Hall, J. S. (2008). Engineering utopia. Fron-tiers in Artificial Intelligence and Appli-cations , 171 , 460.

Hanson, R. (1994). If uploads come first. Ex-

57

Page 58: Artificial Intelligence and Its Implications for Future ... · CenteronLong-TermRisk 9 Howcomplexisthebrain?15 9.1 Onebasicalgorithm?15 9.2 Ontogeneticdevelopment16 10Brainquantityvs.quality17

Center on Long-Term Risk

tropy , 6 (2), 10–15.Hanson, R., & Yudkowsky, E. (2013).

The Hanson-Yudkowsky AI-Foom de-bate. Berkeley, CA: Machine Intelli-gence Research Institute.

Kaplan, R. D. (2013). The revenge of geogra-phy: What the map tells us about com-ing conflicts and the battle against fate(Reprint edition ed.). Random HouseTrade Paperbacks.

Kurzweil, R. (2000). The Age of Spiritual Ma-chines: When Computers Exceed HumanIntelligence. New York: Penguin Books.

Minsky, M. (1984). Afterword to vernorvinge’s novel, "True Names.". Unpub-lished manuscript .

Müller, V. C., & Bostrom, N. (2016). Futureprogress in artificial intelligence: A sur-vey of expert opinion. In V. C. Müller(Ed.), Fundamental issues of artificialintelligence (pp. 553–571). Berlin:Springer.

Ng, A. Y., & Russell, S. J. (2000). Algorithmsfor inverse reinforcement learning. In(pp. 663–670).

Russell, S. J., Norvig, P., Canny, J. F., Malik,J. M., & Edwards, D. D. (2003). Ar-tificial intelligence: a modern approach(Vol. 2). Prentice hall Upper SaddleRiver.

Salamon, A., & Muehlhauser, L. (2012). Sin-

gularity summit 2011 workshop report(Technical Report No. 1). San Francisco,CA: The Singularity Institute.

Sotala, K. (2012). Advantages of artificial in-telligences, uploads, and digital minds.International Journal of Machine Con-sciousness , 04 (01), 275–291. doi: 10.1142/S1793843012400161

Stock, G. (1993). Metaman: the merging ofhumans and machines into a global su-perorganism. New York: Simon & Schus-ter.

Turing, A. M. (1950). Computing machineryand intelligence. Mind , 59 (236), 433–460.

Winnefeld, J. A., & Kendall, F. (2013). Un-manned systems integrated roadmap FY2013-2036. Office of the Secretary of De-fense, US .

Yudkowsky, E. (2004). Coherent extrapolatedvolition. Singularity Institute for Artifi-cial Intelligence.

Yudkowsky, E. (2011). Complex value sys-tems in friendly AI. In D. Hutchison etal. (Eds.), Artificial General Intelligence(Vol. 6830, pp. 388–393). Springer BerlinHeidelberg.

Yudkowsky, E. (2013). Intelligence explosionmicroeconomics. Machine IntelligenceResearch Institute, accessed online Oc-tober , 23 , 2015.

58


Recommended