My current high-level strategic picture of the world

Follow up to: My strategic picture of the work that needs to be done, A view of the main kinds of problems facing us

This post outlines my current epistemic state regarding the most crucial problems facing humanity and the leverage points that we could, at least in principle, intervene on to solve them. 

This is based on my current off-the-cuff impressions, as opposed to careful research. Some of the things that I say here are probably importantly wrong (and if you know that to be the case, please let me know). 

My next step here, is to more carefully research the different “legs” of this strategic outline, shoring up my understanding of each, and clarifying my sense of how tractable each one is as an intervention point.

None of this constitutes a plan. More like, this is a first sketch, to facilitate more detailed elaboration.

The Goal

My overall goal here is to explore the possible ways by which humanity achieves existential victory. By existential victory, I mean, 

The human race[1] survives the acute risk period, and enters a stable period in which it (or our descendants) are able to safely reflect on what a good universe entails, and then act to make the reachable universe good.

This entails humanity surviving all existential risk and getting to a state where existential risk is minimized (for instance, because we are now protected from most disasters by an aligned superintelligence or by a coalition of aligned super intelligences).

Possibly, there is an additional constraint that the human race not just survive, but remain “healthy”along some key dimensions, such as control over our world, intellectual vigor, freedom from oppressive power-structures, trauma, if detriments along those dimensions are irreparable and would therefore permanently limit our ability to reflect on what is Good.

This document describes the two basic trajectories that I can currently see, by which we might systematically achieve that goal (as opposed to succeeding by luck).

The Two Problems

In order to get to that kind of safe attractor state there appear to be two fundamental classes of problems facing humanity: technical AI alignment, and civilizational sanity.

By “technical AI alignment”, I mean the problem of discovering how to build and deploy super-humanly powerful AI systems (embodied either in a singleton, or an “ecosystem” of AIs), safely, in a way that doesn’t extinct humanity, and broadly leaves humans in control of the trajectory of the universe.

By “civilizational sanity”, I mean to point at the catch-all category of whatever causes high leverage decision makers to make wise, scope-sensitive, non-self-destructive, choices.

Civilizational Sanity includes whatever factors cause your society to do things like “saving ~500,000 lives by running human challenge trials on all existing COVID vaccines in February 2020, scaling up vaccine production in parallel with market mechanisms, and then administering vaccinations, en masse, to everyone who wants them, with minimal delay”, or something at least that effective, instead of what the actually did.

It also includes whatever it takes for a government to successfully identify and successfully carry through good macroeconomic policy (which I’ve heard is NGPD targeting, though I don’t personally know).

And it includes whatever factors cause it to be the case that your civilization suddenly acquiring god-like powers (via transformative AI or some other method), results in increased eudaimonia instead of in some kind of disaster.

I think that the only shot we have of exiting the critical risk period by something other than luck is sufficient success at solving AI alignment sufficient success at solving civilizational sanity, and implementing our solution.

(The “Strategic Background” section of this post from MIRI outlines a similar perspective of the high level problem as I outline in this document. However it elaborates, in more detail, a path by which AI alignment would allow humanity to exit the acute risk period (minimally aligned AI -> AGI powered technological development -> risk mitigating technology -> pivotal act that stabilizes the world), and de-emphasizes broad-based civilizational sanity improvements as another path out of the acute risk period.)


To some degree, solutions to either technical alignment or civilizational sanity can substitute for each other, insofar as a full solution to one of these problems would approximately obviate the need for solving the other.

For instance, if we had a full and complete understanding of AI alignment, including rigorous proofs and safe demonstrations of alignment failures, fully-worked-out safe engineering approaches, and crisp theory tying it all together, we would be able to exit the critical risk period. 

Even if it wasn’t practical for a small team to code up an aligned AI and foom, with that level of detail, it would be easy to convince the existing AI community (or perhaps just the best equipped team) to build aligned AI, because one could make the case very strongly for the danger of conventional approaches, and provide a crisply-defined alternative.

On the flip side, at some sufficiently high level of global civilizational sanity, key actors would recognize the huge cost to unaligned AI, and successfully coordinate to prevent anyone from building unaligned AI until alignment theory has been worked out.

We can make partial progress on either of these problems. The task facing humanity as a whole is to make sufficient progress on one, the other, or both, of these problems in order to exit the acute risk period. Speaking allegorically, we need the total progress on both to “sum to 1.” [2]

A note on “sufficiency”

Above, I write “I think that the only shot we have of exiting the critical risk period by something other than luck is sufficient success at solving AI alignment or sufficient success at solving civilizational sanity…”.

I want to clearly highlight that the word “sufficient” is doing a lot of work in that sentence. “Sufficient” progress on AI alignment or “sufficient” progress on civilizational sanity is not yet operationalized enough to be a target. I don’t know what constitutes “enough” progress on either one of these, and I don’t know if I could recognize it if I saw it. 

Civilizational Sanity, in particular, is always a two place function: I can only judge a civilization to be insane relative to my own epistemic process. If societal decision making improves, but my own process improves even faster, the world will still seem mad to me, from my new more privileged vantage point. So in that sense, the goal posts should be constantly moving. 

My key claim is only that there is some frontier defined by these axes such that, if the world moves past that frontier, we will be out of the acute risk period, even though I don’t know where that frontier lies.  

A note on timelines

When I talk about civilizational sanity interventions as a line of attack on AI risk, folks often express skepticism that we have enough time: AI timelines are short, so short that it seems unlikely that plans that attempt to reform the decision making process of the whole world will bear fruit before the 0 hour. [3]

I think that this is wrong-headed. It might very well be the case that we don’t have time for any sufficiently good general sanity boosting plans to reach fruition. But it might just as well be the case that we don’t have time for our technical AI alignment research to progress enough to be practically useful.

Our basic situation (I’m claiming), is that we either need to get to correct alignment theory, or to a generally sane civilization before the transformative AI countdown reaches 0. But we don’t know how long either of those projects will take. Reforming the decision processes of the powerful places in the world might take a century or more, but so might solving technical alignment.

Absent more detailed models about both approaches, I don’t think we can assume that one is more tractable, more reliable, or faster, than the other.

AI alignment in particular?

This breakdown is focused on the AI alignment problem in particular (it’s taking up half of the problem space), giving the impression that AI risk, is the only, or perhaps the most dangerous, existential risk.

While AI risk does seem to me to pose the plurality of the risk to humanity, that isn’t the main reason for breaking things down in this way. 

Rather it’s more that every intervention that I can see that has a shot of moving us out of the acute risk period goes through either powerful AI, or much saner civilization, or both. [I would be excited to have counterexamples, if you can think of any.]

We need protection against bio-risk, nuclear war, and civilizational collapse / decline. But robust protection against any one of those doesn’t protect us from the others by default. Aligned AI and a robustly sane civilization are both general enough that a sufficiently good version of either one would eliminate or mitigate the other risks. Any other solution-areas that have that property, and don’t flow through aligned AI or a general sane civilization would deserve their own treatment in this strategic map, but as of yet, I can’t think of any.

Technical AI alignment

I don’t have much to say about the details of this project. In broad strokes, we’re hoping to get a correct enough philosophical understanding of the concepts relevant to AI alignment, formalize that understanding as math, and eventually develop those formalizations into practical engineering approaches. (Elaboration on this trajectory here.)

(There are some folks who are going straight for developing engineering frameworks [links], hoping that they’ll either work, or give us a more concrete, and more nuanced understanding of the problems that need to be solved.)

It seems quite important if there are better or faster ways to make progress here. But my current sense of things is that it is just a matter of people doing the research work + recruiting more people who can do the research work. See my diagram here

Civilizational Sanity

Follow up to: What are some Civilizational Sanity Interventions

This second category is much less straightforward. 

Within the broad problem space of “causing high-level human decision making to be systematically sane”, I can see a number of specific lines of attack, but I have wide error bars on how tractable each one is.

Those lines of attack are

  1. Unblocking governance innovation
  2. Powerful intelligence enhancement
  3. Reliable, scalable, highly effective resolution of psychological trama
  4. Chinese ascendency

I’m sure this list isn’t exhaustive. These four are the only interventions that I currently know of that seem like (from my current epistemic state) they could transform society enough that we could, for instance, handle AI risk gracefully. 

Relationship between these legs

In particular, there’s an important open question of how these approaches relate to each other, and the broader civilizational sanity project. 

I described above that I think that “AI alignment” and “civilizational sanity” have an “or” or a “sum” relationship: sufficient progress on only one of them can allow us to exit the critical risk period.

There might be a similar relationship between the following civilizational sanity interventions: pushing on any one of them, far enough, leads to a large jump in civilizational sanity, kicking off a positive feedback loop. OR it might instead be that only some of these approaches attack the fundamental problem, and without success on that one front, we won’t see large effects from the others.

Unblocking Innovation in Governance

Better Governance is Possible

The most obvious way to improve the sanity of high-leverage decisions on planet earth is governmental reform.

Our governmental decision making processes are a mess. National politics is tribal politics writ-large: instead of a societal-level epistemology trying to select the best policies, we have a bludgeoning match over which coalitions are best, and which people should be in charge. Politicians are selected on the basis of electability, not expertise, or even alignment with society, yet somehow we seem to be ending up with candidates that no one is enthusiastic about. Congress is famously in a semi-constant self-strangle-hold, unable to get anything done. And the constraints of politics forces those politicians to say absurd things in contradiction with, for instance, basic economic theory, and to grandstand about things that don’t matter and (even worse) things that do.

The current system has all kinds of analytical demonstrable game theoretic drawbacks that make undesirable outcomes all but inevitable: including a two-party system that no one likes much, principal agent problems between the populous and the government, and net societal losses as a result of allocation of benefits to special interest groups.

There hasn’t been a major innovation in high-level governance, since the invention and wide-scale deployment of democracy in the 18th century. It seems like we can do better. We could, in principle, have governmental institutions that are effective epistemologies, are able to identify problems and determine and act on policies at the frontier of society’s various tradeoffs instead at the frontier of the tradeoffs of political expediency.

And because governments have so much influence, more effective information processing in that sector could lead to better institution designs in all other sectors. Public policy is in part a matter of creating and regulating other institutions. Saner government decision making entails setting up efficient and socially-beneficial incentives for health, education, etc, which selects for effective institutions in those more specific sectors. In this way, government is a meta-institution that shapes other institutions. (It’s unclear to me to what degree this is true. How much does better policy at the governmental level, automatically correct the inefficiencies of, say, the medical bureaucracy?)

One might therefore think that a particularly high leverage intervention is to develop new systems of governance. But humanity has a pretty large backlog of governance innovations that seem much better than our current setups on a number of dimensions, from the simple, like using Single Transferable Vote instead of First Past the Post, to the radical, like Futarchy, or the abolition of private property in favor of a COST system.

It seems to me that the bottleneck for better governmental systems is not possible alternatives, but rather the opportunity to experiment with those alternatives. Apparently, there are approximately no venues available for governmental innovation on planet earth.

This is not very surprising, because incumbents in power, benefit from the existing power structure and therefore oppose replacing it with a different mechanism. In general, everyone who has the ability to gatekeep experiments with new governance mechanisms is incentivized to be threatened by those experiments

However, widespread experimentation and innovation in governance would likely be a huge deal, because it would allow humanity as a whole to identify the most successful mechanisms, which, having been shown to work, could be tried at larger scales, and eventually widely adopted.

Experimentation Leads to Eventual Wide Adoption

The basic argument that merely allowing experimentation will eventually lead to better governance on a global scale is as follows: 

Many governance mechanisms, if tried, will not only 1) surpass existing systems, but 2) will surpass existing systems in a legible way, both in aggregate outcomes (like economic productivity, employment, and tax-rate), and from direct engagement with those systems (for instance, once voters become familiar with Futarchy, it might seem absurd that you would elect individuals who are both supposed to represent one’s values and have good plans for achieving those values). 

If the condition of “legible superiority” holds, there would be pressure to replicate those mechanisms elsewhere, at all different scales. Eventually, the best innovations simply become the new standard practices.

Similarly, for many incentive-aligning interventions, not using such methods is a stable attractor: it is in the interests of those in power to resist their adoption. But also, wide-spread use of such methods is also a stable attractor. Once common, it is in the interests of those in power to keep using them. As Robin Hanson says of prediction markets:

I’d say if you look at the example of cost accounting, you can imagine a world where nobody does cost accounting. You say of your organization, “Let’s do cost accounting here.”

That’s a problem because you’d be heard as saying, “Somebody around here is stealing and we need to find out who.” So that might be discouraged.

In a world where everybody else does cost accounting, you say, “Let’s not do cost accounting here.” That will be heard as saying, “Could we steal and just not talk about it?” which will also seem negative.

Similarly, with prediction markets, you could imagine a world like ours where nobody does them, and then your proposing to do it will send a bad signal. You’re basically saying, “People are bullshitting around here. We need to find out who and get to the truth.”

But in a world where everybody was doing it, it would be similarly hard not to do it. If every project with a deadline had a betting market and you say, “Let’s not have a betting market on our project deadline,” you’d be basically saying, “We’re not going to make the deadline, folks. Can we just set that aside and not even talk about it?”

This may generalize to many institution designs that are better than the status quo.

For these reasons, finding ways around the general moratorium of governmental innovation, so that new governance mechanisms can be tried, has possibly huge dividends.

Strategies to allow for Experimentation

Currently, the only approaches I’m aware of for creating spaces for governmental innovation are charter cities and sea steading.

Charter cities are bottle-necked on legal restrictions, and the practical coordination problem of getting a critical mass of residents. But I’m hopeful that COVID has caused a permanent shift to remote work, which will give people more freedom in where to live, and increase competition-in-governance between cities and states, who want to attract talent.

Seasteading is currently bottlenecked on the engineering problem of creating livable floating structures, cheaply enough to be scalable. [Double check if cost is actually the key concern.]

Repeatable reform templates

I wonder if there might be a third, more abstract, line of attack on unblocking governance innovation: developing a repeatable method to change existing governmental structures in a way that incentivizes powerful incumbents.

If it were possible to simply buy out incumbents and overhaul the system, that might be a huge opportunity. However, I guess that in most liberal democracies, this is both illegal and generally repugnant (plus politicians are beholden to their party which might object), such that existing power-holders would not accept a straightforward “money for institutional reform” trade.

But there may be some other version which, in practice, incentivizes power-holders to initiate governmental reform. Possibly by letting those power-holders keep their power for some length of time, and also recieve the credit for the change. Or maybe a setup that targets those people before they take power, when they are more idealistic, and more inclined to make an agreement to cause reform, conditional on all their peers doing the same, in the style of a free-state agreement.

If we could find a repeatable “template” for making such deals, it might unlock the ability to iteratively improve existing governmental structures.

I’m not aware of any academic research in this area (both historical case studies of how these kinds of shifts have occurred in the past and analytic models of how to incentivize such changes seem quite useful to me), nor any practical projects aiming for something like this.

Intelligence enhancement

One might posit that the sort of incentive problems that lead to bizarre institutional policies is the inevitable result of the fact that doing better requires understanding many abstract, non-intuitive concepts and/or careful reasoning in complicated domains, and the average person is of average intelligence, which is insufficient to systematically identify better policies and institutional set-ups over worse ones at the current margin.

In this view, the fundamental problem is that our civilizational decision making processes are much worse than is theoretically possible, because we are collectively not smart enough to do better. Some of us can identify the best policies (or at least determine that one policy is better than another), some of the time, but that relies on understanding that is esoteric to many more people, including many crucial decision makers.

But if the average intelligence of the population as a whole was higher, more good ideas would seem obviously good to more people, and it would be substantially easier to get critical mass of acceptance of sane policies on the object level, as well as better information processing mechanisms. (For instance, If the IQ curve was shifted 35 points to the right, many more people would be able to “see at a glance” why prediction markets are an elegant way of aggregating information.)

More intelligence -> More understanding of important principles -> Saner policies

So it might be that the most effective lever on civilizational sanity is intervening on biological intelligence.

The most plausible way to do this is via widespread genetic enhancement, with either selection methods like iterated embryo selection, or direct gene editing using methods like CRISPR.

My current understanding is that these methods are bottle-necked on our current knowledge of the genetic predictors of intelligence: if we knew those more completely, we would basically be able to start human genetic engineering for intelligence. It seems like that knowledge is going to continue to trickle in as we get better at doing genomic analysis and collect larger and larger data sets. [Note: this is just my background belief. Double-check] Possibly, better Machine Learning methods will lead to a sudden jump in the rate of progress on this project?

On the face of it this suggests that any project that could provide a breakthrough in decoding the genetic predictors of intelligence could be high leverage.

Aside from that, there’s some risk that society will fork down a path in which human genetic enhancement is considered unethical, and will be banned. I’m not that worried about this possibility, because as long as some people / groups are doing this for their children there is a competitive pressure to do the same, and I think it is pretty unlikely that China, which is competitive, at the national level, with the rest of the world, and in which families already regularly exert huge efforts to give their children competitive advantages relative to societies at large, will forgo this opportunity. And if China invests in human genetic enhancement, the US will do the same out of a fear of Chinese dominance.

Some other avenues for human intelligence enhancement include nootropics, which seems much less promising for the basic algernonic argument, and brain computer interfaces like Neuralink. Of the latter, it is currently unknown which dimensions of human cognition can be readily improved, and if such augmentation will lend itself to wisdom or whatever the precursors to civilizational sanity are.

There’s also the possibility of using sufficiently aligned AI assistants to augment our effective intelligence and decision making. Absent our alignment research giving us very clear criteria for aligned systems, this seems like a very tricky proposition, because of the problems described in this post. But in worlds where AI technology continues to improve along its current trajectory, it might be that using limited AI systems as leverage for improving our decision making and research apparatus, to further improve our alignment technologies, is the best way to go.

A note on improving public understanding by methods other than intelligence enhancement:

Possibly there are other ways to substantially increase each person’s individual intellectual reach, so that we can all come to understand more, without increasing biological intelligence. Things in the vein of “better education”. 

I’m pretty dubious of these. 

I think I have far above average skill in communicating (both teaching and being taught) complex or abstract ideas. But even being pretty skilled, for a human, it is just hard. Even when the conditions are exceptional (a motivated student working one-on-one with a skilled tutor who understands the material and can model / pace to the student’s epistemic state), it just takes many focused hours to grasp many important concepts.

I think that any educational intervention effective enough to actually move the needle on civilizational sanity would have to be very radical: so transformative that it would be a general boost in a person’s learning ability, i.e. an increase in effective intelligence. That said, if anyone has ideas for interventions that could increase most people’s intellectual grasp, I would love to hear them.

(…Possibly a dedicated and well executed campaign to educate the public at large ins some small set of extremely important concepts, with the goal of shifting what sorts of explanations sounds plausible to most people (raising the standard for what kinds of economic claims people can make in public with a straight face, for instance), would be helpful on the margin. But this seems to me like an enormous undertaking which would require pedagogical and mass-communication knowledge that I don’t know if anyone has. And I’m not sure how helpful it would be. Even if the whole world understood econ 101, the real world is more complicated than econ 101, such that I don’t know how much that alone would aid people’s assessment of which policies are best. I suppose it would cut out some first-order class of mistakes.)

I do think there are definitely ways to increase our collective intellectual reach, so that societies can systematically land on correct conclusions without increasing any individual person’s intellectual reach or understanding. These include the governance mechanisms I alluded to in the last section. 

There might also exist society wide “public services”, that could do something like this while side-stepping government bureaucracy entirely, like the dream of arbital. I’m not sure how optimistic I should be about those kinds of interventions. The only comparable historical examples that I can think of are wikipedia and public libraries. Both of these seem like clearly beneficial public goods with huge flow-through effects, which make information easily available to people who want it and wouldn’t otherwise have access. But neither one seems to have obviously improved high-level civilizational decision making relative to the counterfactual.

Clearing “Trauma”??

[The following section is much more speculative, and I don’t yet know what to think of it.]

There’s another story in which the main source of our world’s dysfunction is self-perpetuating trauma patterns. 

There are many variations of this story, which differ in important details. I’ll outline one version here, noting that something like this could be true without this particular story being true.

According to this view…

virtually everyone is traumatized (or if you prefer, “socialized”), into dysfunctional and/or exploitative behavior patterns, to greater or lesser degrees, in the course of growing up. 

The central problem isn’t (just) that everyone is following their local self-interest in globally destructive systems, it is actually much worse than that: people are conditioned in such a way that they are not even acting in their narrow self interest. Instead humans myopically focus on goals, and execute strategies, that are both 1) globally harmful and 2) not even aligned with their own “true” reflective, preference, due to false assumptions underlying their engagement with the world. This myopia also inhibits their ability to think clearly about parts of the world that are related to their trauma

(As a case in point, I think it is probably the case that there are lots of people aggressively pursuing AGI, and who are instinctively flinch away from any thought that AGI might be dangerous, because they have a deep, unarticulated, belief that if they can be successful at that, their parents will love them, or they won’t feel lonely any more, or something like that.)

They’ve been conditioned to feel threatened, or triggered by, a huge class of behaviors that are globally productive, like accurate tracking of harms, and many kinds of positive-sum arrangements.

Furthermore, the core reason why most people can’t seem to think or to have “beliefs in the sense of anticipations about the world” is not (mostly) a matter of intelligence, but rather that their default reasoning and sense-making functions have been damaged by the institutions and social contexts in which they participate (school, for instance).

Those traumatizing contexts  are not designed by conscious malice, but they are also not necessarily incidental. It’s possible that they have been optimized to be traumatizing, via unconscious social hill-climbing.

This is because trauma-patterns are replicators: they have enough expressive power to recreate themselves in other humans, and are therefore subject to a selection pressure that gradually promotes the variations that are most effective at propagating themselves. (Furthermore, there’s a hypothesis that for a traumatized mind, one of the best ways to control the environment to make it safe is to similarly traumatize people in the environment.) The net result is horrendous systems of hurt people hurting people, as a way to pass on that particular flavor of hurt to future generations.

Part of the hypothesis here is that these trauma patterns have always been a thing in human societies, but there has also typically been a counter-force, namely that if you need to work together and have a good understanding of the physical world to survive in a harsh environment, your epistemology can’t be damaged too badly, or you’ll die. But in the modern world, we’ve become so wealthy, and most people have become so divorced from actual production, that that counter-force is much diminished.

Implications for Improving the World

If this story is true, governmental reform is likely to fail for seemingly-mysterious reasons, because there is selection pressure optimizing against good institutional epistemology, over and above bureaucratic inertia and the incentives of entrenched power-holders. If you don’t defuse the underlying trauma-patterns, any system that you try to build will either fail or be subverted by those trauma-patterns..

And under this story, it’s unclear how much intelligence enhancement would help. All else being equal, it seems (?) that being smarter helps in developmental work, and healing from one’s personal traumas, but it might also be the case that greater intelligence enables more efficient propagation of trauma patterns. 

If this story is largely correct, it implies that the actual bottleneck for the world is understanding trauma and trauma resolution methods well enough to heal trauma-patterns at scale. If we can do that, the agency and intellect of the world (which is currently mostly suppressed), will be unblocked, and most of the other problems of the world will approximately solve themselves.

I also don’t know to what extent there already exist methods for reliably and rapidly resolving trauma patterns, and the degree to which the bottleneck is actually one 1 to n scaling rather than 0 to 1 discovery. Certainly there are various methods that at least some people have gotten at least some benefit from, though it remains unclear how much of the total potential benefit even the best methods provide to the people who have gotten the most from them.

I don’t know what to think of all of this yet, the degree to which trauma is at the root of the world’s ills, the degree to which things have actually been optimized to be traumatizing as opposed to ending up that way by accident, or even if “trauma” is a meaningful category pointing at a real phenomenon that is different from “learning” in a principled way.

I’ll note that even if the strong version of this story is not correct, it might still be the case that many people’s intellectual capability is handicaped by psychological baggage. So it might be the case that research into effective trauma-resolution methods may be an effective line of attack on improving the world’s intellectual capability. For instance, finding a non-scalable method for reliably resolving trauma might be an important win, because at minimum, we could apply it to all of the AI safety researchers. This might be one of the possible gains on the table for speeding progress on the alignment problem. 

(Though this is also something to be careful of, since such methods would likely have some kind of psychological side effects, and we don’t necessarily want to reshape the psyches of earth’s contingent of alignment researchers all in the same way. I worry that we might have already done this to some degree with circling: Circling seems quite good and quite helpful, but I think that we should be concerned that if we make a mistake about what directions are good to push the culture of our small AI safety community, we’re likely to destroy a lot of value.)

The Rise of China??

In the first section describing what I meant by civilizational sanity up above, I noted “sensible response to COVID” as one indicator of civilizational sanity. Notably, China’s covid response, seems, overall, to have been much more effective than the West’s.

This doesn’t seem like an aberration, either. As a non-expert foreigner, looking in, it looks like China’s society/government is overall more like an agent than the US government. It seems possible to imagine the PRC having a coherent “stance” on AI risk. If Xi Jinping came to the conclusion that AGI was an existential risk, I imagine that that could actually be propagated through the chinese government, and the chinese society, in a way that has a pretty good chance of leading to strong constraints on AGI development (like the nationalization, or at least the auditing of any AGI projects).

Whereas if Joe Biden, or Donald Trump, or anyone else who is anything close to a “leader of the US government”, got it into their head that AI risk was a problem…the issue would immediately be politicized, with everyone in the media taking sides on one of two lowest-common denominator narratives each straw-manning the other. One side would attempt to produce (probably senseless) legislation in the frame of preventing the bad guys from doing bad things, while the other side goes to absurd lengths to block them as a matter of principle, and in the end we’re left with some regulation on tech companies that doesn’t cleave to the actual shape of the problems at all, and pisses off researchers who are frustrated that this anthropomorphizing, “AI risk” hubbub, just made their lives much harder, alienating them.

(One might think that this is actually a national security issue, and it would be taken more seriously than that, but COVID was a huge public health issue, and we managed to politicize wearing masks.

So, maybe it would be good for the world if China was the dominant world power?

I think that overall, China’s society, and high level decision making is currently saner than that of the western world. So maybe on the margin, the world is better off if China were more dominant. 

However, I have a number of reservations.

  1. China’s human rights record is not great. Apparently, there is an ongoing genocide of the Uighurs, happening right now. My deontology is pretty reluctant to put mass murders in charge of the world.
    1. I’m not sure how to think about this. Genocide is extremely bad. And furthermore we have a strong, coordinated norm to censor and take action against it (although, obviously not that strong, since I don’t know of a single person who has taken any action other than (occasionally) tweeting news articles, in this case). But also, I’m not sure whether I should just parse this as standard practice for great powers / ruling empires. The US has committed similarly bad atrocities in its history (slavery and the extermination/relocation of the Indians come to mind), and as far as I know, continues to commit similar atrocities. And the stakes are literally astronomical. Does the specter of extinction and the weight of all future earth-originating civilization mean we should just neglect contemorary genocide in our realpolitik calculations? I’m not comfortable with that, but I don’t know what to think about it.
  2. I don’t have a strong reason to expect that China’s institutions are fundamentally better functioning than the US’s, I think they’re just younger. If China is exhibiting the kind of functionality and decisiveness, that the US was enjoying 60 years ago, then it seems pretty plausible that 60 years from now (or maybe sooner than that, on the general principle that the world is moving faster now), the chinese system will be similarly sclerotic and dysfunctional.
    1. Indeed, we might make a more specific argument that institutions are able to remain functional so long as there is growth, because a growing pie means everyone can win. But when growth slows or stops there’s no longer a selection pressure for effectiveness, and institutions entrench themselves because rent seeking is a better strategy. (Or maybe the causality goes the other way: there’s a continual, gradual, increase in rent-seeking as actors entrench their power-bases, which gradually cuts out production, until all (or almost all) that’s left is rent-seeking. In any case, I think China has got to be nearing the top of it’s explosive s-curve, and I don’t expect its national agency to be robust to that.
  3. I would guess, not knowing much more than a stereotype of Chinese culture, that even if it is saner and more effective than western culture right now, the west has more of the generators that can lead to further increases in civilizational sanity. I might be totally off base here, but the East’s emphasis on conformity and social hierarchy seems like it would make it even MORE resistant to, say, the wide-scale adoption of prediction markets than the US is. (Though maybe the ruling party is enough of an unincentivized incentivizer to overcome this effect?) I suspect that it is even less likely to generate the kind of iconoclastic thinkers who would think up the idea of prediction markets in the first place. It would be quite bad if we got some boost in civilizational sanity with the rise of China, but that Chinese dominance curtained any further improvement on that dimension. 
  4. It is currently unclear to me how much it matters which culture the intelligence explosion takes place in.
    1. Under the assumption of a strong attractor in the human CEV, it seems like it doesn’t matter much at all: we’re all, currently, so radically confused about Goodness, that the apparently-huge cultural differences are just noise. And even if that’s not true, I would guess that the differences between my ideal future, and some human descended society, are probably massively outweighed by the looming probability of extinction and a sterile universe. Chinese people live happy lives in China, now, and have lived happy lives throughout history, even if they tolerate a level of conformity and restriction-of-expression that I would find stifling, to say the least.
    2. However, I think it might not be an exaggeration to say that the CPC believes that thoughts should be censored to serve the state. I can imagine technological augmented versions of thought control that are so severe as to permanently damage the human civilization’s ability to think together, which might constitute the sort of irreparable “damage” that prevent us from deliberating to discover and the executing on a good future. If this sort of technology is more likely to come from China than from the west, Chinese supremacy might be disastrous.
    3. It does seem really important that AGI not lock the future into an inescapable immortal dictatorship (Probably? Maybe most people just live basically happy lives in an immortal dictatorship?). And I want to track if that is more likely to result from an intelligence explosion directed by China than by my native culture.

Summing up

  • The problem facing humanity in this era is figuring out how to exit the acute risk period, systematically, instead of by luck. 
  • The only ways that I can see to do this, depend on aligned AI or a much saner human civilization. 
  • So the problem breaks down into two subproblems: solve AI alignment or achieve enough civilizational sanity.
  • AI alignment research is going apace, and if there are ways to speed it up, that would be great.
  • I can currently see four lines of attack on civilizational sanity: unblocking innovation in governance, intelligence enhancement, and possibly widespread trauma resolution, or Chineses ascendancy. 
  • All of those plans might turn out to be on-net bad for the world, on further reflection.


Some of my questions for going forward:

  1. How long until transformative AI arrives?
  2. Are there tractable ways to speed Technical AI alignment substantially?
  3. Are there tractable ways to unblock governance experimentation?
  4. Follow up on charter city projects
  5. What’s blocking sea steading? Is it cost as I believe?
  6. How large are the expected flow-through effects of governmental sanity interventions on other sectors?
  7. Conditional on unblocking innovation in governance, how long is it likely to take for the best innovations to propagate outward until they are standard best practices?
  8. What’s the bottleneck for human genetic intelligence augmentation
  9. Along what dimensions would Neuralink improve human capabilities?
  10. Is “trauma” a natural kind? To what extent is it true that psychological trauma is driving exploitative and counter-productive organizational patterns in the world?
  11. How much saner is China? How long will the Chinese system remain “alive”?
  12. How different will the long term future be, if the intelligence explosion happens in one culture rather than another?


[1] –  Or some civilization or other mechanism, bearing human values.

[2] –  Though of course, there isn’t a linear relationship between the individual progress bars, and total victory. We might be “70%” of the way to a full solution to both problems (whatever that means), but between the two, not have enough of the right pieces to get a combined solution that lets us exit the critical risk period. That’s why it is only allegorical.

[3] – And, in contrast, I sometimes talk with people who are so pessimistic about alignment work, that they take it for granted that the thing to do is take over the world by conventional means.

Psychoanalyzing, people seem to gravitate to the line of attack that is within their skillset, and therefore feels more comfortable to think about. This seems like a perfectly good heuristic for specialization, but it doesn’t seem like a particularly good way to identify which approach is more tractable in the abstract.

How do we prepare for final crunch time? – Some initial thoughts

[epistemic status: Brainstorming and first draft thoughts.

Inspired by something that Ruby Bloom wrote and the Paul Christiano episode of the 80,000 hours podcast.]

One claim I sometimes hear about AI alignment [paraphrase]:

“It is really hard to know what sorts of AI alignment work are good, this far out from transformative AI. As we get closer, we’ll have a clearer sense of what AGI / Transformative AI is likely to actually look like, and we’ll have much better traction on what kind of alignment work to do. In fact, it might be the case that MOST of the work of AI alignment is done in the final few years before AGI, when we’ve solved most of the hard capabilities problems already and we can work directly, with good feedback loops, on the sorts of systems that we want to align.”

Usually this is taken to mean that the alignment research that is being done today is primarily to enable or make easier future, more critical, alignment work. But “progress in the field” is only one dimension to consider in boosting the work of alignment researchers in final crunch time.

In this post I want to take the above posit seriously, and consider the implications. If most of the alignment work that will be done is going to be done in the final few years before the deadline, our job in 2021 is mostly to do everything that we can to enable the people working on the problem in the crucial period (which might be us, or our successors, or both) so that they are as well equipped as we can possibly make them.

What are all the ways that we can think of that we can prepare now, for our eventual final exam? What should we be investing in, to improve our efficacy in those final, crucial, years?

The following are some ideas.


For this to matter, our alignment researchers need to be at the cutting edge of AI capabilities, and they need to be positioned such that their work can actually be incorporated into AI systems as they are deployed.

A different kind of work

Most current AI alignment work is pretty abstract and theoretical, for two reasons. 

The first reason is a philosophical / methodological claim: There’s a fundamental “nearest unblocked strategy” / overfitting problem. Patches that correct clear and obvious alignment failures are unlikely to generalize fully, you’ll only have constrained unaligned optimization to channels that you can’t recognize. For this reason, some claim, we need to have an extremely robust, theoretical understanding of intelligence and alignment, ideally at the level of proofs.

The second reason is a practical consideration: we just don’t have powerful AI systems to work with, so there isn’t much in the way of tinkering and getting feedback.

The second objection becomes less relevant in final crunch time: in this scenario, we’ll have powerful systems 1) that will be built along the same lines as the systems that it is crucial to align and 2)  that will have enough intellectual capability to pose at least semi-realistic “creative” alignment failures (ie, current systems are so dumb, and liven in such constrained environments, that it isn’t clear how much we can learn about aligning literal superintelligences from them.)

And even if the first objection ultimately holds, theoretical understanding often (usually?) follows from practical engineering proficiency. It seems like it might be a fruitful path to tinker with semi-powerful systems trying out different alignment approaches empirically, and tinkering to discover new approaches, and then backing up to do robust theory-building given much richer data about what seems to work.

I could imagine sophisticated setups that enable this kind of tinkering and theory building. For instance, I imagine a setup that includes:

  • A “sandbox” that afford easy implementation of many different AI architectures and custom combinations of architectures, with a wide variety easy-to-create, easy-to-adjust, training schemes, and a full suite of interpretability tools. We could quickly try out different safety schemes, in different distributions, and observe what kinds of cognition and behavior result.
  • A meta AI that observes the sandbox, and all of the experiments therein, to learn general principles of alignment. We could use interpretability tools to use this AI as a “microscope” on the AI alignment problem itself, abstracting out patterns and dynamics that we couldn’t easily have teased out with only our own brains. This meta system might also play some role in designing the experiments to run in the sandbox, to allow it to get the best data to test it’s hypotheses.
  • A theorem prover that would formalize the properties and implications of those general alignment principles, to give us crisply specified alignment criteria by which we can evaluate AI designs.

Obviously, working with a full system like this is quite different than abstract, purely theoretical work on decision theory or logical uncertainty. It is closer to the sort of experiments that the OpenAI and Deep Mind safety teams have published, but even that is a pretty far cry from the kind of rapid-feedback tinkering that I’m pointing at here.

Given that the kind of work that leads to research progress might be very different in final crunch time than it is now, it seems worth trying to forecast what shape that work will take and trying to see if there are ways to practice doing that kind of work before final crunch time.


Obviously, when we get to final crunch time, we don’t want to have to spend any time studying fields that we could have studied in the lead-up years. We want to have already learned all the information and ways of thinking that we’ll want to know, then. It seems worth considering what fields we’ll wish we had known when time comes.

The obvious contenders:

  • Machine Learning
  • Machine Learning interpretability
  • All the Math of Intelligence that humanity has yet amassed [Probability theory Causality, etc.]

Some less obvious possibilities:

  • Neuroscience?
  • Geopolitics, if it turns out that which technical approach is ideal hinges on important facts about the balance of power?
  • Computer security?
  • Mechanism design in general?

Research methodology / Scientific “rationality”

We want the research teams tackling this problem in final crunch time to have the best scientific methodology and the best cognitive tools / habits for making research progress, that we can manage to provide them.

This maybe includes skills or methods in the domains of:

  • Ways to notice as early as possible if you’re following an ultimately-fruitless research path
  • Noticing / Resolving /Avoiding blindspots
  • Effective research teams
  • Original seeing / overcoming theory blindness / hypothesis generation
  • ???


One obvious thing is to spend time now, investing in habits and strategies for effective productivity. It seems senseless to waste precious hours in our acute crunch time due to procrastination or poor sleep. It is well worth in to solve those problems now. But aside from the general suggestion to get your shit in order and develop good habits now I can think of two more specific things that seem good to do.

Practice no-cost-too-large productive periods

There maybe trades that could make people more productive on the margin, but are too expensive in regular life. For instance, I think that I might conceivably benefit from having a dedicated person who’s job is to always be near me, so that I can duck with them with 0 friction. I’ve experimented a little bit with similar ideas (like having a list of people on call to duck with), but it doesn’t seem worth it for me to pay a whole extra person-salary to have the person be on call, and in the same building, instead of on-call via zoom.

But it is worth it at final crunch time.

It might be worth it to spend some period of time, maybe a week, maybe a month, every year, optimizing unrestrainedly for research productivity, with no heed to cost at all, so that we can practice how to do that. This is possibly a good thing to do anyway, because it might uncover trades that actually, on reflection are worth importing into my regular life.

Optimize rest

One particular subset of personal productivity, that jumps out at me: each person should figure out their actual optimal cadence of rest.

There’s a failure mode that ambitious people commonly fall into, which is working past the point when marginal hours of work are negative. When the whole cosmic endowment is on the line, there will be a natural temptation to push yourself to work as hard as you can, and forgo rest. Obviously, this is a mistake. Rest isn’t just a luxury: it is one of the inputs to productive work.

There is a second level of this error in which one, grudgingly, takes the minimal amount of rest time, and gets back to work. But the amount of rest time required to stay functional is not the optimal amount of rest, the amount the maximizes productive output. Eliezer mused years ago, that he felt kind of guilty about it, but maybe he should actually take two days off between research days, because the quality of his research seemed better on days when he happened to have had two rest days preceding.

In final crunch time, we want everyone to be resting the optimal amount that actually maximizes area under the curve, not the one that maximizes work-hours. We should do binary search now, to figure out what the optimum is.

Also, obviously, we should explore to discover highly effective methods of rest, instead of doing whatever random things seem good (unless, as it turns out, “whatever random thing seems good” is actually the best way to rest).

Picking up new tools

One thing that will be happening in this time, is there will be a flurry of new AI tools that can radically transform thinking and research, perhaps increasingly radical tools coming at a rate of once a month or faster.

Being able to take advantage of those tools and start using them for research immediately, with minimal learning curve, seems extremely high leverage.

If there are things that we can do that increase the ease of picking up new tools and using them to their full potential (instead of, as is common, using only the features afforded by your old tools and only very gradually

Some thoughts (probably bad):

  • Could we set up our workflows, somehow, such that it is easy to integrate new tools into them? Like if you already have a flexible, expressive research interface (something like Roam?), and you’re used to regular changes in capability to the backed of the interface?
  • Can we just practice? Can we have a competitive game of introducing new tools, and trying to orient to them and figure out how to exploit them creatively as possible?
  • Probably it should be some people’s full time job to translate cutting edge developments in AI into useful tools and practical workflows, and then to teach those workflows to the researchers?
  • Can we design a meta-tool that helps us figure out how to exploit new tools? Is it possible to train an AI assistant specifically for helping us get the most out of our new AI tools?
  • Can we map out the sort of constraints on human thinking and/or the the sorts of tools that will be possible, in advance, so that we can practice with much weaker versions of those tools, and get a sense of how we would use them, so that we’re ready when they arrive?
  • Can we try out new tools on psychedelics, to boost neuroplasticity? Is there some other way to temporarily weaken our neural priors? Maybe some kind of training in original seeing?

Staying grounded and stable in spite of the stakes

Obviously, being one of the few hundred people on whom the whole future of the cosmos rests, while the singularity is happening around you, and you are confronted with the stark reality of how doomed we are, is scary and disorienting and destabilizing.

I imagine that that induces all kinds of psychological pressures, that might find release in any of a number of concerning outlets: by deluding one’s self about the situation, by becoming manic and frenetic, by sinking into immovable depression.

We need our people to have the virtue of being able to look the problem in the eye, with all of its terror and disorientation, and stay stable enough to make tough calls, and make them sanely.

We’re called to cultivate a virtue (or maybe a set of virtues) of which I don’t know the true name, but which involve courage and groundless, and determination-without-denial.

I don’t know what is entailed in cultivating that virtue. Perhaps meditation? Maybe testing one’s self at literal risk to one’s life? I would guess that people in other times and places, who needed to face risk to their own lives and that of their families, did have this virtue, or some part of it, and it might be fruitful to investigate those cultures and how that virtue was cultivated.