On category I and category II problems

Google doc version, which is better for inline comments.

Please note that while I make claims about what I understand other people to believe, I don’t speak for them, and might be mistaken. I don’t represent for MIRI or CFAR, and they might disagree. The opinions expressed are my own.

Someone asked me why I am broadly “pessimistic”. This post, which is an articulation of an important part of my world view, came out of my answer to that question.

Two kinds of problems: 

When I’m thinking about world scale problems, I think it makes sense to break them down into two categories. These categories are nebulous (like all categories in the real world), but I also think that these are natural clusters, and most world scale problems will clearly be in one or the other.

Category I Problems

Let’s say I ask myself “Will factory farming have been eliminated 100 years from now?”, I can give a pretty confident “Yes.”

It looks to me that we have a pretty clear understanding of what needs to happen to eliminate factory farming. Basically, we need some form of synthetic meat (and other animal products) that is comparable to real meat in taste and nutrition, and that costs less than real meat to produce. Once you have that, most people will prefer synthetic meat to real meat, demand for animal products will drop off markedly, and the incentive for large scale factory farming will vanish. At that point the problem of factory farming is basically solved.

Beyond and Impossible meat is already pretty good, maybe even good enough to satisfy the typical consumer. That gives us proof of concept. So eliminating factory farming it is almost just a matter of making the cost of synthetic meat < the cost of real meat.

There’s an ecosystem of companies (notably Impossible and Beyond) that are iteratively making progress on exactly the problem of driving down the cost. Possibly, it will take longer than I expect, but unless something surprising happens, that ecosystem will succeed eventually.

In my terminology, I would say that there exists a machine that is systematically ratcheting towards a solution to factory farming. There is a system that is making incremental progress towards the goal, month by month, year by year. If things continue as they are, and there aren’t any yet-unseen fundamental blockers, I have strong reason to think that the problem will be fully solved eventually.

Now, this doesn’t mean that the problem is already solved, figuring out ways to make this machine succeed marginally faster is still really high value, because approximately a billion life-years are spent in torturous conditions in factory farms every year. If you can spend your career speeding up the machine by six months, that prevents at least a half a billion life-years of torture (more than that if you shift the rate of progress instead of just translating the curve to the left).

Factory farming is an example of what (for lack of a better term), I will call a “category I” problem. 

Category II Problems

In contrast, suppose I ask “Will war have been eliminated 100 years from now?” To this question, my answer has to be “I don’t know, but probably not?”

That’s not because I think that ending war is impossible. I’m pretty confident that there exists some set of knowledge (of institution design? of game theory? Of diplomacy?) with which we could construct a system that robustly avoided war, forever. 

But in my current epistemic state, I don’t even have a sketch of how to do it. There isn’t a well specified target that if we hit it, we’ll have achieved victory (in the way that “cost of synthetic meat < cost of real meat” is a victory criterion).  And there is no machine that is systematically ratcheting towards progress on eliminating war.

That isn’t to say that there aren’t people working on all kinds of projects which are trying to help with the problem of “war”, or reducing the incidence of war (peace talks, education, better international communication, what have you). There are many people, in some sense, working hard at the problem. 

But their efforts don’t cohere into a mechanism that reliably produces incremental progress towards solving the problem. Or said differently, there are things that people can do that help with the problem on the margin, but those marginal improvements don’t “add up to” a full solution. It isn’t the case that if we do enough peace talks and enough education and enough translation software, war will be permanently solved. 

In a very important sense, humanity does not have traction on the problem of ending war, in a way that it does have traction on the problem of ending factory farming. 

Ending war isn’t impossible in principle, but there are currently no levers that we can pull to make systematic progress towards a solution. We’re stuck doing things that might help a little on the margin, but we don’t know how to set up a system such that if that machine runs long enough, war will be permanently solved. 

I’m going to call problems like this, where there does not exist a machine that is making systematic progress, a “category II” problem.

Category I vs. Category II

Some category I problems: 

  • Factory Farming
  • Global Poverty
  • “Getting off the rock” (but possibly only because Elon Musk was born, and took personal responsibility for it)
  • Getting a computer to almost every human on earth
  • Legalizing / normalizing gay marriage 
  • Eradicating Malaria
  • Possibly solving aging?? (and if so, probably only because a few people in the transhumanism crowd took personal responsibility for it)

Some Category II problems

  • War
  • AI alignment
  • Global coordination
  • Civilizational sanity / institutional decision making (on the timescale of the next century)
  • Civilizational collapse
  • Achieving stable, widespread mental health
  • Political polarization
  • Cost disease

(If you can draw a graph that shows the problem more-or-less steadily getting better over time, it’s a category I problem. If progress is being made in some sense, but progress happens in stochastic jumps, or it’s non-obvious how much progress was made in a particular period, it’s a category II problem.)

In order for factory farming to not be solved, something surprising, from outside of my current model, needs to happen. (Like maybe technological progress stopping, or a religious movement that shifts the direction of society.)

Whereas in order for war to be solved, something surprising needs to happen. Namely, there needs to be some breakthrough, a fundamental change in our understanding of the problem, that gives humanity traction on the problem, and enables the construction of a machine that can make systematic, incremental progress.  

(Occasionally, a problem will be in an inbetween category, where there doesn’t yet exist a machine that is making reliable progress on the problem, but that isn’t because our understanding of the shape of the problem is insufficient. Sometimes the only reason why there isn’t a machine doing this work is only because no person or group, of sufficient competence, has taken heroic responsibility for getting a machine started. 

For instance, I would guess that our civilization is capable of making steady, incremental progress on the effectiveness of cryonics, up to the point cryonics being a reliable, functional, technology. But progress on “better cryonics” is mostly stagnant. I think that the only reason there isn’t a machine incrementally pushing on making cryonics work is that no one (or at least no one of sufficient skill) has taken it upon themselves to solve that problem, thereby jumpstarting a machine that makes steady incremental progress on it. [ 1 ]

It is good to keep in mind that these category 1.5 problems exist, but they mostly don’t bear on the rest of this analysis.)

Maybe the most important thing: Category I problems get solved as a matter of course. Category II problems get solved when we stumble into a breakthrough that turns them into category I problems.

Where in this context, a “breakthrough” is when “some change (either in our understanding (the map) or in the world (the territory) that causes the shape of the problem to shift, such that humanity can now make reliable systematic progress towards a solution, unless something surprising happens.”

Properties of Category I vs. Category II problems

Category I problemsCategory II problems
There exists a “machine” that is making systematic, reliable progress on the problemThere isn’t a “machine” making systematic, reliable progress on the problem, and humanity doesn’t yet know how to make such a machine
Marginal improvements can “add up” to a full solution to the problemMarginal improvements don’t “add up” to a full solution to the problem
The problem will be solved, unless something surprising happensThe problem won’t be solved until something surprising happens
We’re mostly not depending on luckWe’re substantially depending on luck
Progress is usually incrementalProgress is usually stochastic
Progress is pretty “concrete”; it is relatively unlikely that some promising project will turn out to be completely uselessProgress is “speculative”; it is a live possibility that any given pieces of work that we consider progress will later prove completely useless in light of better understanding
The bottleneck to solving the problem is the machine going better or fasterThe bottleneck to solving the problem is a breakthrough, defined as “some shift (either in our map or in the territory) that changes the shape of the problem enough that we can make reliable systematic progress on it”
Solving the problem does not require any major conceptual or ontological shifts; progress consists of many, constrained, engineering problemsOur understanding of the problem, or ontology of the problem, will change at least once, but most likely many times, on the path to a full solution
There might be graphs that show the problem getting better, more-or-less steadily, over time.It’s quite hard to assess how much progress is made in a given unit of time, or even if exciting “milestones” actually constitute progress
We know how to train people to fill roles in which they can reliably contribute to progress on the problem, mostly what is needed is effective people to fill those rolesWe have only shaky and tenuous knowledge of how to train people to make progress on the problem; mostly what’s needed is people who can figure out for themselves how to get orientation for themselves
If all the relevant inputs were free, the problem would be solved or very close to solved.

(eg with a perpetual motion machine that produced arbitrarily large amounts of the ingredients to impossible meat, factory farming would be over)
If the inputs were free, this would not solve the problem

(eg with a hypercomputer, AI safety would not be solved)
Properties of Category I vs Category II problems

Luck and Deadlines

In general, I’m optimistic about category I problems, and pessimistic about category II problems (at least in the short term). 

And crucially, humanity doesn’t know how to systematically make progress on intentionally turning a given category II problem into a category I problem. 

We’re not hopeless at it. It is not completely a random walk. Individual geniuses have sometimes applied themselves to a problem that we were so confused about as to not have traction on it, and produced a breakthrough that gives us traction. For instance, Newton giving birth to mathematicized physics. [Note: I’m not sure that this characterization is relevantly correct.]

But overall, when those breakthroughs occur, it tends to be in large part due to chance. We mostly don’t know how to make breakthroughs, on particular category II problems, on demand. 

Which is to say, any given problem transitioning from II to I, depends on luck. 

And unfortunately, it seems like some of the category II problems I listed above 1) are crucial to the survival of civilization, and 2) have deadlines.

It looks like, from my epistemic vantage point, that if we don’t solve some subset of those problems before some unknown deadline (possibly as soon as this century), we die. That’s it. Game over.

Human survival depends on solving some problems for which we currently have no levers. There is nothing that we can push on to make reliable, systematic progress. And there’s no machine making reliable, systematic progress.

Absent a machine that is systematically ratcheting forward progress on the problem, there’s no strong reason to think that it will be solved. 

Or to state the implicit claim more explicitly: 

  1. Large scale problems are solved only when there is a machine incrementally moving towards a solution. 
  2. There are a handful of large scale problems that seem crucial to solve in the medium term. 
  3. There aren’t machines incrementally moving towards solutions to those problems.

So by default, unless something changes, I expect that those problems won’t be solved.

On AI alignment

There are people who dispute that AI risk is a category II problem, and they are accordingly more optimistic. I believe that Rohin Shah and Paul Christiano both think that there’s a pretty good chance that business-as-usual AI development will solve alignment as a matter of course, because alignment problems are on the critical path to making functioning AI. 

That is, they think that there is an existing machine that is solving the problem: the AI/ ML field itself.

If I understand them correctly, they both think that there is a decent chance that their EA-focused alignment work won’t have been counterfactually relevant in retrospect, but it still seems like a good altruistic bet to lay groundwork for alignment research now. 

In the terms of my ontology here, they see themselves as shoring up the alignment-progress machine, or helping it along with some differential progress, just in case the machine turns out to have been inadequate to the task of solving the problem before the deadline. They think that there is a substantial chance that their work will, in retrospect, turn out to have been counterfactually irrelevant, but because getting the AI alignment problem right seems really high leverage for the value of the future, it is a good altruistic bet to do work that makes it more likely that the machine will succeed, on the margin. 

This is in marked contrast to how I imagine the MIRI leadership is orienting to the problem: When they look at the world, it looks to them like there isn’t a machine ratcheting towards safety at all. Yes, there are some alignment-like problems that will be solved in the course of AI development, but largely by patches that invite nearest-unblocked strategy problems, and which won’t generalize to extremely powerful systems. As such, MIRI is making a desperate effort to make or to be a machine that ratchets toward progress on safety.

I think this question of “is there a machine that is ratcheting the world towards more AI safety”, is one of the main cruxes between the non-MIRI optimists, and the MIRI-pessimists, which is often overshadowed by the related, but less crucial question of “how sharp will takeoff be?”

On “rationality”

Over the past 3 years, I have regularly taught at AIRCS workshops. These are mainly a recruitment vehicle for MIRI, run as a collaboration between MIRI and CFAR.

At AIRCS workshops, one thing that we say early on is that AI safety is a “Preparadigmatic field”, which is more or less the same as saying that AI alignment is a category II problem. AI safety as a field hasn’t matured to the point that there are known levers for making progress.

And we, explain, we’re going to teach some rationality techniques at the workshop, because those are supposed to help one orient to a paradigmatic field. 

Some people are skeptical that these funny rationality methods are helpful at all (which, to be clear, I think is a quite reasonable position). Other people give the opposite critique, “it seems like clear thinking and effective communication and generally making use of all your mind’s abilities, is useful in all fields, not just preparadigmatic ones.”

But this is missing the point slightly. It isn’t so much that these tools are particularly helpful for prepardigmatic fields, it’s that in preparadigmatic fields, this is the best we can provide.

More developed fields have standard methods that are known to be useful. We train aspiring physicists in calculus, because we have ample evidence that calculus is an extremely versatile tool for making progress on physics, for example. [another example would be helpful here]

We don’t have anything like that for AI safety. There are not yet standard tools in the AI safety toolbox that we know to be useful and that everyone should learn. We don’t have that much traction on the problems.

So as a backstop, we teach very general principles of thinking and problem solving, as a sort of “on ramp” to thinking about your own thinking and how you might improve your own process. The hope is that will translate into skill in getting traction on a confusing domain that doesn’t yet have standard methods.

When you’re flailing, and you don’t have any kind of formula for making research progress, it can make sense to go meta and think about how to think about how to solve problems. But if you can just make progress on the object level, you’re almost certainly better off doing that.

People sometimes show up at AIRCS workshops expecting us to give them concrete technical problems that they can try and solve, and are sometimes discouraged that instead we’re teaching these woo-y or speculative psychological techniques. 

But, by and large, we DON’T have concrete, well-specified, technical problems to solve (note that this is a somewhat contentious claim, see the bit about AI safety above). The work that needs to be done is something like “wandering around in one’s confusion in such a way that one can crystalize well specified technical problems.” And how to do that is very personal and idiosyncratic: we don’t have systematized methods for doing that, such that someone can just follow the instructions and get the desired result. But we’ve found that the woo-y tools seem to give people new levers and new perspectives for figuring out how to do this for themselves, so that’s what we have to share.

As a side note: I have a gripe that “rationality” has come to conflate two things, there’s “how to make progress on natural philosophy when you don’t have traction on the problem” and separately, there’s “effective decision-making in the real world”. These two things have some overlap, but they are really pretty different things. And I think that development on both of them has been hurt by lumping them under one label. 


If I were to offer a critique of Effective Altruism it would be this: EA in general doesn’t distinguish between category I and category II problems. 

Of course, this doesn’t apply to every person who is affiliated with EA. But many EAs, and especially EA movement builders, implicitly think of all problems as class I problems. That is, they are implicitly behaving as if there exists a machine that will convert resources (talent, money, attention) into progress on important problems. 

And, as I said, there are many problems for which there does exist a machine doing that. But in cases where there isn’t such a machine, because the crucial breakthrough that would turn the problem from category II to category I hasn’t occurred yet, this is counterproductive. 

The sort of inputs that allow a category I problem-solving machine to go better or faster, are just very different from the sort of inputs that make it more likely that humanity will get traction on a category II problem. 

Ease of Absorbing Talent

For one thing, more people is often helpful for solving a category I problem, but is usually not helpful for getting traction on a category II problem. Machines solving category I problems can typically absorb people, because (by dint of having traction), they are able to train people to fill useful roles in the machine. 

Folks trying to get traction on a category II problem, by definition, don’t have systematic methods by which they can make progress. So they can’t easily train people to do known-to-be-useful work. 

I think there are clusters that are able to make non-0 progress on getting traction, and that those clusters can sometimes usefully absorb people, but they basically need to be people that have a non-trivial ability to get traction themselves. Because the work to be done is trying to get traction on the problem, it doesn’t help much to have more people who are waiting to be told what to do: the thing that they need to do is figure out what to do. [ 2 ]

Benefit of Marginal Improvements

For another thing, because machines solving category I problems can generally absorb resources in a useful way, and because they are making incremental progress, it can be useful to nudge people’s actions in the direction of a good thing, without them shifting all the way to doing the “optimal” thing. 

  • Maybe someone won’t go vegan, but they might go vegetarian. Or maybe a company won’t go vegetarian, but it can be persuaded to use humanely farmed eggs.
  • Maybe this person won’t change their whole career-choice, but they would be open to choosing more impact oriented projects within their career.
  • Maybe most people won’t become hard-core EAs, but if many people change their donations to be more effective on the margin, that seems like a win.
  • Maybe an AI development company won’t hold back the deployment of their AI system for years, and find a way to insure that it is aligned, but it can be convinced to hire a safety team.

For category I problems interventions on the margin “add up to” significant progress on the problem. What a category I problem means is that there are at least some forms of marginal improvement that, in aggregate, solve the problem.

But in the domain of category II problems, marginal nudges like this are close to useless. 

Because there is not a machine, to which people can contribute, that will incrementally make progress on the problem, getting people to be somewhat more aware of the problem, or care a little about the problem, doesn’t do anything meaningful.

In the domain of a category II problem, the thing that is needed is a breakthrough (or a series of breakthroughs) that will turn it into a category I problem. 

I don’t know how to make this happen in full generality, but it looks a lot closer to a small number of highly talented, highly-invested people who are working obsessively on the problem than it looks like a large mass of people who are aware that the problem is important and will make marginal changes to their lives to help. 

A machine learning researcher who is not interested in really engaging with the hard part of the problem of AI safety, because that would require backchaining from bizarre-seeming science-fiction scenarios, but is working on a career-friendly paper that he has rationalized, by way of some hand-wavy logic as, “maybe being relevant to AI safety someday”, is, it seems to me, quite unlikely to uncover some crucial insight that leads to a breakthrough on the problem. 

Even a researcher who is sincerely trying to help with AI safety, whose impact model is “I will get a PhD, and then publish papers about AI safety” is, according to me, extremely unlikely to produce a breakthrough. Such a person is acting as if there is a role that they can fill, and if they execute well in that role, this will make progress on the problem. They’re acting as if they are part of an existing solution machine.

But, as discussed, that’s not the way it works: we don’t know how to make progress on AI safety, there aren’t straightforward things to do that will reliably make progress. The thing that is needed is people who will take responsibility for independently attempting to figure out how to make progress (which, incidentally, involves tackling the whole problem in its entirety).

If a person is not thinking at all about how to get traction on the core problem, but is just doing “the sort of activities that an AI safety researcher” does, I think it is very unlikely that their activity is going to lead to the kind of groundbreaking work that changes our understanding of the problem enough that AI alignment flips from being in category II to being in category I.

In these category II cases, shifts in behavior, on the margin, do not add up to progress on the problem. 

EA’s causal history

Not making a distinction between category I and category II problems, EA as a culture, has a tendency to try and solve all problems as if they are category I problems, namely by recruiting more people and directing them at AI alignment or AI policy or whatever.

This isn’t that surprising to me, because I think a large part of the memetic energy of EA came from the identification of a specific category I problem: There was a claim that a first world person could easily save lives by donating to effective charities focused on the third world. 

Insofar as this is true, there’s a strong reason to recruit as many people to be engaged with EA as possible: the more people involved, the more money moved, the more lives saved.

However, in the years since EA was founded, the intellectual core of the movement updated in two ways: 

Firstly, it now seems more dubious (to me at least) that lives can be cheaply and easily saved that way. [I’m much less confident of this point, and it isn’t a crux for me, so I’ve removed discussion of it to an endnote.[ 3 ] ]

And more importantly, EA realized that there are vastly more important problems than global poverty. 

X-risk has been a thread in EA discourse from the beginning: one of the three main intellectual origins of EA was LessWrong (the other two being GiveWell coming from the finance world, and Giving What We Can / 80,000 hours stemming from some Oxford philosophers). But sometime around 2014 the leadership of EA settled on long-termism and x-risk as the “most important thing”. (I was part of the volunteer team for EAG 2015, and I saw firsthand that there was an explicit push to prioritize x-risk.)

Over recent years that shift has taken form: deprioritizing earning to give, but promoting careers in AI policy, for instance. 

I claim that this pivot represents a more fundamental shift than most in EA realize. Namely, a shift from EA being the sort of thing that is attempting to make progress on a category I problem to EA being the sort of thing attempting to make progress on a category II problem. 

EA developed as a movement for making progress on a category I problem: It had as a premise that ordinary people can do a lot of good by moving money to (mostly) pre-existing charities, and by choosing high impact “careers” (where a “career” implies an already-mapped out trajectory). Category I orientation is implicit in EA’s DNA.

And so when the movement tries to make the pivot to focusing on x-risk, it implicitly orients to that problem as if it were a category I problem. “Where can we direct money and talent, to make impact on x-risk”.

For all of the reasons above, I think this is a fundamental error: the inputs that lead to progress on a category I problem are categorically different than those that lead to progress on a category II problem. 

To state my view starkly, if somewhat uncharitably, EA is essentially, shoveling resources into a non-existent x-risk progress machine, imagining that if they shovel enough, this will lead to progress on x-risk, and the other core problems of the world. As I have said, I think that there isn’t yet a machine that can make consistent incremental progress in that way.

But it would be pretty hard, I think, for EA to do something different. This isn’t a trivial error to correct: changing this would entail changing the fundamental nature of what EA is and how it orients to the world.


[ 1 ] – I’ve heard Nate Sores refer to “the curse of Cryonics” which is that anyone who has enough clear thinking independent thought to realize that cryonics is important, can also see that there are vastly more important problems.

[ 2 ] –  I think that this is a little bit of an oversimplification. I think there are ways that people can contribute usefully in a mode that is close to “executing on what some people they trust think is a good idea”, but you do need a critical mass of people who are clawing for traction themselves, or this doesn’t work. Therefore the regent is people clawing for traction, and capacity to absorb conscientious ability-to-execute talent is limited.

[ 3 ] – The world is really complicated, and I’m not sure how confident to be that our charitable interventions are working. This post by Ben Hoffman pointing out that the expected value distribution for deworming interventions trends into the negative, but most EAs don’t seem aware of this, seems on point. 

Further (though this is my personal opinion, more than any kind of consensus), the argument Eliezer makes here is extremely important for anyone taking aim at eradicating poverty. If there is some kind of control system that keeps people poor, regardless of the productivity of society, this suggests that there might be some fundamental blocker to ending poverty: until that control system is addressed somehow, all of the value ostensibly created by global health interventions is being sucked away. 

(Admittedly, this argument holds somewhat less force if one is aiming simply to reduce human suffering in the next year, rather than any kind of long term goal like “permanently and robustly end poverty.”

Considerations like that one suggest that we should be much less certain that our favored global poverty interventions are effective. Instead of (or perhaps, in addition to) rushing to move as many resources to those interventions as possible, it seems like the more important thing is to continue trying to puzzle out, via experiments and modeling and whatever tools we have, how the relevant social systems work, to verify that we’re actually having the positive effects that we’re aiming for. It seems to me that even in the domains of global poverty, we still need a good deal of exploration relative to exploitation of the interventions we’ve uncovered.

Relatedly it seems to me that focusing on charity is somewhat myopic: it is true that there is a machine eradicating poverty, but that machine is called capitalism, not charity donation. Maybe the charity donations help, but I would guess that if you want to really have the most impact here, the actual thing to do is not give to charities but something more sophisticated that engages more with that machine. (I might be wrong about that. Maybe in fact global health interventions are, actually the best way to unblock the economic engine so that capitalism can pull the third world out of poverty faster).

My current high-level strategic picture of the world

Follow up to: My strategic picture of the work that needs to be done, A view of the main kinds of problems facing us

This post outlines my current epistemic state regarding the most crucial problems facing humanity and the leverage points that we could, at least in principle, intervene on to solve them. 

This is based on my current off-the-cuff impressions, as opposed to careful research. Some of the things that I say here are probably importantly wrong (and if you know that to be the case, please let me know). 

My next step here, is to more carefully research the different “legs” of this strategic outline, shoring up my understanding of each, and clarifying my sense of how tractable each one is as an intervention point.

None of this constitutes a plan. More like, this is a first sketch, to facilitate more detailed elaboration.

The Goal

My overall goal here is to explore the possible ways by which humanity achieves existential victory. By existential victory, I mean, 

The human race[1] survives the acute risk period, and enters a stable period in which it (or our descendants) are able to safely reflect on what a good universe entails, and then act to make the reachable universe good.

This entails humanity surviving all existential risk and getting to a state where existential risk is minimized (for instance, because we are now protected from most disasters by an aligned superintelligence, or a coalition of aligned super intelligences).

Possibly, there is an additional constraint that the human race not just survive, but remain “healthy”along some key dimensions, such as control over our world, intellectual vigor, freedom from oppressive power-structures, trauma, if detriments along those dimensions are irreparable and would therefore permanently limit our ability to reflect on what is Good.

This document describes the two basic trajectories that I can currently see, by which we might systematically achieve that goal (as opposed to succeeding by luck).

The Two Problems

In order to get to that kind of safe attractor state there appear to be two fundamental classes of problems facing humanity: technical AI alignment, and civilizational sanity.

By “technical AI alignment”, I mean the problem of discovering how to build and deploy super-humanly powerful AI systems (embodied either in a singleton, or an “ecosystem” of AIs), safely, in a way that doesn’t extinct humanity, and broadly leaves humans in control of the trajectory of the universe.

By “civilizational sanity”, I mean to point at the catch-all category of whatever causes high leverage decision makers to make wise, scope-sensitive, non-self-destructive, choices.

Civilizational Sanity includes whatever factors cause your society to do things like “saving ~500,000 lives by running human challenge trials on all existing COVID vaccines in February 2020, scaling up vaccine production in parallel with market mechanisms, and then administering vaccinations, en masse, to everyone who wants, with minimal delay”, or something at least that effective, instead of what the US did instead.

It also includes whatever it takes for a government to successfully identify and successfully carry through good macroeconomic policy (which I’ve heard is NGPD targeting, though I don’t personally know).

And it includes whatever factors cause it to be the case that your civilization suddenly acquiring god-like powers (via transformative AI or some other method), results in increased eudaimonia instead of in some kind of disaster.

I think that the only shot we have of exiting the critical risk period by something other than luck is sufficient success at solving AI alignment sufficient success at solving civilizational sanity, and implementing our solution.

(The “Strategic Background” section of this post from MIRI outlines a similar perspective of the high level problem as I outline in this document. However it elaborates, in more detail, a path by which AI alignment would allow humanity to exit the acute risk period (minimally aligned AI -> AGI powered technological development -> risk mitigating technology -> pivotal act that stabilizes the world), and de-emphasizes broad-based civilizational sanity improvements as another path out of the acute risk period.)


To some degree, solutions to either technical alignment or civilizational sanity can substitute for each other, insofar as a full solution to one of these problems would approximately obviate the need for solving the other.

For instance, if we had a full and complete understanding of AI alignment, including rigorous proofs and safe demonstrations of alignment failures, fully-worked-out safe engineering approaches, and crisp theory tying it all together, we would be able to exit the critical risk period. 

Even if it wasn’t practical for a small team to code up an aligned AI and foom, with that level of detail, it would be easy to convince the existing AI community (or perhaps just the best equipped team) to build aligned AI, because one could make the case very strongly for the danger of conventional approaches, and provide a crisply-defined alternative.

On the flip side, at some sufficiently high level of global civilizational sanity, key actors would recognize the huge cost to unaligned AI, and successfully coordinate to prevent anyone from building unaligned AI until alignment theory has been worked out.

We can make partial progress on either of these problems. The task facing humanity as a whole is to make sufficient progress on one, the other, or both, of these problems in order to exit the acute risk period. Speaking allegorically, we need the total progress on both to “sum to 1.” [2]

A note on “sufficiency”

Above, I write “I think that the only shot we have of exiting the critical risk period by something other than luck is sufficient success at solving AI alignment or sufficient success at solving civilizational sanity…”.

I want to clearly highlight that the word “sufficient” is doing a lot of work in that sentence. “Sufficient” progress on AI alignment or “sufficient” progress on civilizational sanity is not yet operationalized enough to be a target. I don’t know what constitutes “enough” progress on either one of these, and I don’t know if I could recognize it if I saw it. 

Civilizational Sanity, in particular, is always a two place function: I can only judge a civilization to be insane relative to my own epistemic process. If societal decision making improves, but my own process improves even faster, the world will still seem mad to me, from my new more privileged vantage point. So in that sense, the goal posts should be constantly moving. 

My key claim is only that there is some frontier defined by these axes such that, if the world moves past that frontier, we will be out of the acute risk period, even though I don’t know where that frontier lies.  

A note on timelines

When I talk about civilizational sanity interventions as a line of attack on AI risk, folks often express skepticism that we have enough time: AI timelines are short, so short that it seems unlikely that plans that attempt to reform the decision making process of the whole world will bear fruit before the 0 hour. [3]

I think that this is wrong-headed. It might very well be the case that we don’t have time for any sufficiently good general sanity boosting plans to reach fruition. But it might just as well be the case that we don’t have time for our technical AI alignment research to progress enough to be practically useful.

Our basic situation (I’m claiming), is that we either need to get to correct alignment theory, or to a generally sane civilization before the transformative AI countdown reaches 0. But we don’t know how long either of those projects will take. Reforming the decision processes of the powerful places in the world might take a century or more, but so might solving technical alignment.

Absent more detailed models about both approaches, I don’t think we can assume that one is more tractable, more reliable, or faster, than the other.

AI alignment in particular?

This breakdown is focused on the AI alignment problem in particular (it’s taking up half of the problem space), giving the impression that AI risk, is the only, or perhaps the most dangerous, existential risk.

While AI risk does seem to me to pose the plurality of the risk to humanity, that isn’t the main reason for breaking things down in this way. 

Rather it’s more that every intervention that I can see that has a shot of moving us out of the acute risk period goes through either powerful AI, or much saner civilization, or both. [I would be excited to have counterexamples, if you can think of any.]

We need protection against bio-risk, nuclear war, and civilizational collapse / decline. But robust protection against any one of those doesn’t protect us from the others by default. Aligned AI and a robustly sane civilization are both general enough that a sufficiently good version of either one would eliminate or mitigate the other risks. Any other solution-areas that have that property, and don’t flow through aligned AI or a general sane civilization would deserve their own treatment in this strategic map, but as of yet, I can’t think of any.

Technical AI alignment

I don’t have much to say about the details of this project. In broad strokes, we’re hoping to get a correct enough philosophical understanding of the concepts relevant to AI alignment, formalize that understanding as math, and eventually develop those formalizations into practical engineering approaches. (Elaboration on this trajectory here.)

(There are some folks who are going straight for developing engineering frameworks [links], hoping that they’ll either work, or give us a more concrete, and more nuanced understanding of the problems that need to be solved.)

It seems quite important if there are better or faster ways to make progress here. But my current sense of things is that it is just a matter of people doing the research work + recruiting more people who can do the research work. See my diagram here

Civilizational Sanity

Follow up to: What are some Civilizational Sanity Interventions

This second category is much less straightforward. 

Within the broad problem space of “causing high-level human decision making to be systematically sane”, I can see a number of specific lines of attack, but I have wide error bars on how tractable each one is.

Those lines of attack are

  1. Unblocking governance innovation
  2. Powerful intelligence enhancement
  3. Reliable, scalable, highly effective resolution of psychological trama
  4. Chinese ascendency

I’m sure this list isn’t exhaustive. These four are the only interventions that I currently know of that seem like (from my current epistemic state) they could transform society enough that we could, for instance, handle AI risk gracefully. 

Relationship between these legs

In particular, there’s an important open question of how these approaches relate to each other, and the broader civilizational sanity project. 

I described above that I think that “AI alignment” and “civilizational sanity” have an “or” or a “sum” relationship: sufficient progress on only one of them can allow us to exit the critical risk period.

There might be a similar relationship between the following civilizational sanity interventions: pushing on any one of them, far enough, leads to a large jump in civilizational sanity, kicking off a positive feedback loop. OR it might instead be that only some of these approaches attack the fundamental problem, and without success on that one front, we won’t see large effects from the others.

Unblocking Innovation in Governance

Better Governance is Possible

The most obvious way to improve the sanity of high-leverage decisions on planet earth is governmental reform.

Our governmental decision making processes are a mess. National politics is tribal politics writ-large: instead of a societal-level epistemology trying to select the best policies, we have a bludgeoning match over which coalitions are best, and which people should be in charge. Politicians are selected on the basis of electability, not expertise, or even alignment with society, yet somehow we seem to be ending up with candidates that no one is enthusiastic about. Congress is famously in a semi-constant self-strangle-hold, unable to get anything done. And the constraints of politics forces those politicians to say absurd things in contradiction with, for instance, basic economic theory, and to grandstand about things that don’t matter and (even worse) things that do.

The current system has all kinds of analytical demonstrable game theoretic drawbacks that make undesirable outcomes all but inevitable: including a two-party system that no one likes much, principal agent problems between the populous and the government, and net societal losses as a result of allocation of benefits to special interest groups.

There hasn’t been a major innovation in high-level governance, since the invention and wide-scale deployment of democracy in the 18th century. It seems like we can do better. We could, in principle, have governmental institutions that are effective epistemologies, are able to identify problems and determine and act on policies at the frontier of society’s various tradeoffs instead at the frontier of the tradeoffs of political expediency.

And because governments have so much influence, more effective information processing in that sector could lead to better institution designs in all other sectors. Public policy is in part a matter of creating and regulating other institutions. Saner government decision making entails setting up efficient and socially-beneficial incentives for health, education, etc, which selects for effective institutions in those more specific sectors. In this way, government is a meta-institution that shapes other institutions. (It’s unclear to me to what degree this is true. How much does better policy at the governmental level, automatically correct the inefficiencies of, say, the medical bureaucracy?)

One might therefore think that a particularly high leverage intervention is to develop new systems of governance. But humanity has a pretty large backlog of governance innovations that seem much better than our current setups on a number of dimensions, from the simple, like using Single Transferable Vote instead of First Past the Post, to the radical, like Futarchy, or the abolition of private property in favor of a COST system.

It seems to me that the bottleneck for better governmental systems is not possible alternatives, but rather the opportunity to experiment with those alternatives. Apparently, there are approximately no venues available for governmental innovation on planet earth.

This is not very surprising, because incumbents in power, benefit from the existing power structure and therefore oppose replacing it with a different mechanism. In general, everyone who has the ability to gatekeep experiments with new governance mechanisms is incentivized to be threatened by those experiments

However, widespread experimentation and innovation in governance would likely be a huge deal, because it would allow humanity as a whole to identify the most successful mechanisms, which, having been shown to work, could be tried at larger scales, and eventually widely adopted.

Experimentation Leads to Eventual Wide Adoption

The basic argument that merely allowing experimentation will eventually lead to better governance on a global scale is as follows: 

Many governance mechanisms, if tried, will not only 1) surpass existing systems, but 2) will surpass existing systems in a legible way, both in aggregate outcomes (like economic productivity, employment, and tax-rate), and from direct engagement with those systems (for instance, once voters become familiar with Futarchy, it might seem absurd that you would elect individuals who are both supposed to represent one’s values and have good plans for achieving those values). 

If the condition of “legible superiority” holds, there would be pressure to replicate those mechanisms elsewhere, at all different scales. Eventually, the best innovations simply become the new standard practices.

Similarly, for many incentive-aligning interventions, not using such methods is a stable attractor: it is in the interests of those in power to resist their adoption. But also, wide-spread use of such methods is also a stable attractor. Once common, it is in the interests of those in power to keep using them. As Robin Hanson says of prediction markets:

I’d say if you look at the example of cost accounting, you can imagine a world where nobody does cost accounting. You say of your organization, “Let’s do cost accounting here.”

That’s a problem because you’d be heard as saying, “Somebody around here is stealing and we need to find out who.” So that might be discouraged.

In a world where everybody else does cost accounting, you say, “Let’s not do cost accounting here.” That will be heard as saying, “Could we steal and just not talk about it?” which will also seem negative.

Similarly, with prediction markets, you could imagine a world like ours where nobody does them, and then your proposing to do it will send a bad signal. You’re basically saying, “People are bullshitting around here. We need to find out who and get to the truth.”

But in a world where everybody was doing it, it would be similarly hard not to do it. If every project with a deadline had a betting market and you say, “Let’s not have a betting market on our project deadline,” you’d be basically saying, “We’re not going to make the deadline, folks. Can we just set that aside and not even talk about it?”

This may generalize to many institution designs that are better than the status quo.

For these reasons, finding ways around the general moratorium of governmental innovation, so that new governance mechanisms can be tried, has possibly huge dividends.

Strategies to allow for Experimentation

Currently, the only approaches I’m aware of for creating spaces for governmental innovation are charter cities and sea steading.

Charter cities are bottle-necked on legal restrictions, and the practical coordination problem of getting a critical mass of residents. But I’m hopeful that COVID has caused a permanent shift to remote work, which will give people more freedom in where to live, and increase competition-in-governance between cities and states, who want to attract talent.

Seasteading is currently bottlenecked on the engineering problem of creating livable floating structures, cheaply enough to be scalable. [Double check if cost is actually the key concern.]

Repeatable reform templates

I wonder if there might be a third, more abstract, line of attack on unblocking governance innovation: developing a repeatable method to change existing governmental structures in a way that incentivizes powerful incumbents.

If it were possible to simply buy out incumbents and overhaul the system, that might be a huge opportunity. However, I guess that in most liberal democracies, this is both illegal and generally repugnant (plus politicians are beholden to their party which might object), such that existing power-holders would not accept a straightforward “money for institutional reform” trade.

But there may be some other version which, in practice, incentivizes power-holders to initiate governmental reform. Possibly by letting those power-holders keep their power for some length of time, and also recieve the credit for the change. Or maybe a setup that targets those people before they take power, when they are more idealistic, and more inclined to make an agreement to cause reform, conditional on all their peers doing the same, in the style of a free-state agreement.

If we could find a repeatable “template” for making such deals, it might unlock the ability to iteratively improve existing governmental structures.

I’m not aware of any academic research in this area (both historical case studies of how these kinds of shifts have occurred in the past and analytic models of how to incentivize such changes seem quite useful to me), nor any practical projects aiming for something like this.

Intelligence enhancement

One might posit that the sort of incentive problems that lead to bizarre institutional policies is the inevitable result of the fact that doing better requires understanding many abstract, non-intuitive concepts and/or careful reasoning in complicated domains, and the average person is of average intelligence, which is insufficient to systematically identify better policies and institutional set-ups over worse ones at the current margin.

In this view, the fundamental problem is that our civilizational decision making processes are much worse than is theoretically possible, because we are collectively not smart enough to do better. Some of us can identify the best policies (or at least determine that one policy is better than another), some of the time, but that relies on understanding that is esoteric to many more people, including many crucial decision makers.

But if the average intelligence of the population as a whole was higher, more good ideas would seem obviously good to more people, and it would be substantially easier to get critical mass of acceptance of sane policies on the object level, as well as better information processing mechanisms. (For instance, If the IQ curve was shifted 35 points to the right, many more people would be able to “see at a glance” why prediction markets are an elegant way of aggregating information.)

More intelligence -> More understanding of important principles -> Saner policies

So it might be that the most effective lever on civilizational sanity is intervening on biological intelligence.

The most plausible way to do this is via widespread genetic enhancement, with either selection methods like iterated embryo selection, or direct gene editing using methods like CRISPR.

My current understanding is that these methods are bottle-necked on our current knowledge of the genetic predictors of intelligence: if we knew those more completely, we would basically be able to start human genetic engineering for intelligence. It seems like that knowledge is going to continue to trickle in as we get better at doing genomic analysis and collect larger and larger data sets. [Note: this is just my background belief. Double-check] Possibly, better Machine Learning methods will lead to a sudden jump in the rate of progress on this project?

On the face of it this suggests that any project that could provide a breakthrough in decoding the genetic predictors of intelligence could be high leverage.

Aside from that, there’s some risk that society will fork down a path in which human genetic enhancement is considered unethical, and will be banned. I’m not that worried about this possibility, because as long as some people / groups are doing this for their children there is a competitive pressure to do the same, and I think it is pretty unlikely that China, which is competitive, at the national level, with the rest of the world, and in which families already regularly exert huge efforts to give their children competitive advantages relative to societies at large, will forgo this opportunity. And if China invests in human genetic enhancement, the US will do the same out of a fear of Chinese dominance.

Some other avenues for human intelligence enhancement include nootropics, which seems much less promising for the basic algernonic argument, and brain computer interfaces like neurolink. Of the latter, it is currently unknown which dimensions of human cognition can be readily improved, and if such augmentation will lend itself to wisdom or whatever the precursors to civilizational sanity are.

There’s also the possibility of using sufficiently aligned AI assistants to augment our effective intelligence and decision making. Absent our alignment research giving us very clear criteria for aligned systems, this seems like a very tricky proposition, because of the problems described in this post. But in worlds where AI technology continues to improve along its current trajectory, it might be that using limited AI systems as leverage for improving our decision making and research apparatus, to further improve our alignment technologies, is the best way to go.

A note on improving public understanding by methods other than intelligence enhancement:

Possibly there are other ways to substantially increase each person’s individual intellectual reach, so that we can all come to understand more, without increasing biological intelligence. Things in the vein of “better education”. 

I’m pretty dubious of these. 

I think I have far above average skill in communicating (both teaching and being taught) complex or abstract ideas. But even being pretty skilled, for a human, it is just hard. Even when the conditions are exceptional (a motivated student working one-on-one with a skilled tutor who understands the material and can model / pace to the student’s epistemic state), it just takes many focused hours to grasp many important concepts.

I think that any educational intervention effective enough to actually move the needle on civilizational sanity would have to be very radical: so transformative that it would be a general boost in a person’s learning ability, i.e. an increase in effective intelligence. That said, if anyone has ideas for interventions that could increase most people’s intellectual grasp, I would love to hear them.

(…Possibly a dedicated and well executed campaign to educate the public at large ins some small set of extremely important concepts, with the goal of shifting what sorts of explanations sounds plausible to most people (raising the standard for what kinds of economic claims people can make in public with a straight face, for instance), would be helpful on the margin. But this seems to me like an enormous undertaking which would require pedagogical and mass-communication knowledge that I don’t know if anyone has. And I’m not sure how helpful it would be. Even if the whole world understood econ 101, the real world is more complicated than econ 101, such that I don’t know how much that alone would aid people’s assessment of which policies are best. I suppose it would cut out some first-order class of mistakes.)

I do think there are definitely ways to increase our collective intellectual reach, so that societies can systematically land on correct conclusions without increasing any individual person’s intellectual reach or understanding. These include the governance mechanisms I alluded to in the last section. 

There might also exist society wide “public services”, that could do something like this while side-stepping government bureaucracy entirely, like the dream of arbital. I’m not sure how optimistic I should be about those kinds of interventions. The only comparable historical examples that I can think of are wikipedia and public libraries. Both of these seem like clearly beneficial public goods with huge flow-through effects, which make information easily available to people who want it and wouldn’t otherwise have access. But neither one seems to have obviously improved high-level civilizational decision making relative to the counterfactual.

Clearing “Trauma”??

[The following section is much more speculative, and I don’t yet know what to think of it.]

There’s another story in which the main source of our world’s dysfunction is self-perpetuating trauma patterns. 

There are many variations of this story, which differ in important details. I’ll outline one version here, noting that something like this could be true without this particular story being true.

According to this view…

virtually everyone is traumatized (or if you prefer, “socialized”), into dysfunctional and/or exploitative behavior patterns, to greater or lesser degrees, in the course of growing up. 

The central problem isn’t (just) that everyone is following their local self-interest in globally destructive systems, it is actually much worse than that: people are conditioned in such a way that they are not even acting in their narrow self interest. Instead humans myopically focus on goals, and execute strategies, that are both 1) globally harmful and 2) not even aligned with their own “true” reflective, preference, due to false assumptions underlying their engagement with the world. This myopia also inhibits their ability to think clearly about parts of the world that are related to their trauma

(As a case in point, I think it is probably the case that there are lots of people aggressively pursuing AGI, and who are instinctively flinch away from any thought that AGI might be dangerous, because they have a deep, unarticulated, belief that if they can be successful at that, their parents will love them, or they won’t feel lonely any more, or something like that.)

They’ve been conditioned to feel threatened, or triggered by, a huge class of behaviors that are globally productive, like accurate tracking of harms, and many kinds of positive-sum arrangements.

Furthermore, the core reason why most people can’t seem to think or to have “beliefs in the sense of anticipations about the world” is not (mostly) a matter of intelligence, but rather that their default reasoning and sense-making functions have been damaged by the institutions and social contexts in which they participate (school, for instance).

Those traumatizing contexts  are not designed by conscious malice, but they are also not necessarily incidental. It’s possible that they have been optimized to be traumatizing, via unconscious social hill-climbing.

This is because trauma-patterns are replicators: they have enough expressive power to recreate themselves in other humans, and are therefore subject to a selection pressure that gradually promotes the variations that are most effective at propagating themselves. (Furthermore, there’s a hypothesis that for a traumatized mind, one of the best ways to control the environment to make it safe is to similarly traumatize people in the environment.) The net result is horrendous systems of hurt people hurting people, as a way to pass on that particular flavor of hurt to future generations.

Part of the hypothesis here is that these trauma patterns have always been a thing in human societies, but there has also typically been a counter-force, namely that if you need to work together and have a good understanding of the physical world to survive in a harsh environment, your epistemology can’t be damaged too badly, or you’ll die. But in the modern world, we’ve become so wealthy, and most people have become so divorced from actual production, that that counter-force is much diminished.

Implications for Improving the World

If this story is true, governmental reform is likely to fail for seemingly-mysterious reasons, because there is selection pressure optimizing against good institutional epistemology, over and above bureaucratic inertia and the incentives of entrenched power-holders. If you don’t defuse the underlying trauma-patterns, any system that you try to build will either fail or be subverted by those trauma-patterns..

And under this story, it’s unclear how much intelligence enhancement would help. All else being equal, it seems (?) that being smarter helps in developmental work, and healing from one’s personal traumas, but it might also be the case that greater intelligence enables more efficient propagation of trauma patterns. 

If this story is largely correct, it implies that the actual bottleneck for the world is understanding trauma and trauma resolution methods well enough to heal trauma-patterns at scale. If we can do that, the agency and intellect of the world (which is currently mostly suppressed), will be unblocked, and most of the other problems of the world will approximately solve themselves.

I also don’t know to what extent there already exist methods for reliably and rapidly resolving trauma patterns, and the degree to which the bottleneck is actually one 1 to n scaling rather than 0 to 1 discovery. Certainly there are various methods that at least some people have gotten at least some benefit from, though it remains unclear how much of the total potential benefit even the best methods provide to the people who have gotten the most from them.

I don’t know what to think of all of this yet, the degree to which trauma is at the root of the world’s ills, the degree to which things have actually been optimized to be traumatizing as opposed to ending up that way by accident, or even if “trauma” is a meaningful category pointing at a real phenomenon that is different from “learning” in a principled way.

I’ll note that even if the strong version of this story is not correct, it might still be the case that many people’s intellectual capability is handicaped by psychological baggage. So it might be the case that research into effective trauma-resolution methods may be an effective line of attack on improving the world’s intellectual capability. For instance, finding a non-scalable method for reliably resolving trauma might be an important win, because at minimum, we could apply it to all of the AI safety researchers. This might be one of the possible gains on the table for speeding progress on the alignment problem. 

(Though this is also something to be careful of, since such methods would likely have some kind of psychological side effects, and we don’t necessarily want to reshape the psyches of earth’s contingent of alignment researchers all in the same way. I worry that we might have already done this to some degree with circling: Circling seems quite good and quite helpful, but I think that we should be concerned that if we make a mistake about what directions are good to push the culture of our small AI safety community, we’re likely to destroy a lot of value.)

The Rise of China??

In the first section describing what I meant by civilizational sanity up above, I noted “sensible response to COVID” as one indicator of civilizational sanity. Notably, China’s covid response, seems, overall, to have been much more effective than the West’s.

This doesn’t seem like an aberration, either. As a non-expert foreigner, looking in, it looks like China’s society/government is overall more like an agent than the US government. It seems possible to imagine the PRC having a coherent “stance” on AI risk. If Xi Jinping came to the conclusion that AGI was an existential risk, I imagine that that could actually be propagated through the chinese government, and the chinese society, in a way that has a pretty good chance of leading to strong constraints on AGI development (like the nationalization, or at least the auditing of any AGI projects).

Whereas if Joe Biden, or Donald Trump, or anyone else who is anything close to a “leader of the US government”, got it into their head that AI risk was a problem…the issue would immediately be politicized, with everyone in the media taking sides on one of two lowest-common denominator narratives each straw-manning the other. One side would attempt to produce (probably senseless) legislation in the frame of preventing the bad guys from doing bad things, while the other side goes to absurd lengths to block them as a matter of principle, and in the end we’re left with some regulation on tech companies that doesn’t cleave to the actual shape of the problems at all, and pisses off researchers who are frustrated that this anthropomorphizing, “AI risk” hubbub, just made their lives much harder, alienating them.

(One might think that this is actually a national security issue, and it would be taken more seriously than that, but COVID was a huge public health issue, and we managed to politicize wearing masks.

So, maybe it would be good for the world if China was the dominant world power?

I think that overall, China’s society, and high level decision making is currently saner than that of the western world. So maybe on the margin, the world is better off if China were more dominant. 

However, I have a number of reservations.

  1. China’s human rights record is not great. Apparently, there is an ongoing genocide of the Uighurs, happening right now. My deontology is pretty reluctant to put mass murders in charge of the world.
    1. I’m not sure how to think about this. Genocide is extremely bad. And furthermore we have a strong, coordinated norm to censor and take action against it (although, obviously not that strong, since I don’t know of a single person who has taken any action other than (occasionally) tweeting news articles, in this case). But also, I’m not sure whether I should just parse this as standard practice for great powers / ruling empires. The US has committed similarly bad atrocities in its history (slavery and the extermination/relocation of the Indians come to mind), and as far as I know, continues to commit similar atrocities. And the stakes are literally astronomical. Does the specter of extinction and the weight of all future earth-originating civilization mean we should just neglect contemorary genocide in our realpolitik calculations? I’m not comfortable with that, but I don’t know what to think about it.
  2. I don’t have a strong reason to expect that China’s institutions are fundamentally better functioning than the US’s, I think they’re just younger. If China is exhibiting the kind of functionality and decisiveness, that the US was enjoying 60 years ago, then it seems pretty plausible that 60 years from now (or maybe sooner than that, on the general principle that the world is moving faster now), the chinese system will be similarly scrolrotic and dysfunctional.
    1. Indeed, we might make a more specific argument that institutions are able to remain functional so long as there is growth, because a growing pie means everyone can win. But when growth slows or stops there’s no longer a selection pressure for effectiveness, and institutions entrench themselves because rent seeking is a better strategy. (Or maybe the causality goes the other way: there’s a continual, gradual, increase in rent-seeking as actors entrench their power-bases, which gradually cuts out production, until all (or almost all) that’s left is rent-seeking. In any case, I think China has got to be nearing the top of it’s explosive s-curve, and I don’t expect its national agency to be robust to that.
  3. I would guess, not knowing much more than a stereotype of Chinese culture, that even if it is saner and more effective than western culture right now, the west has more of the generators that can lead to further increases in civilizational sanity. I might be totally off base here, but the East’s emphasis on conformity and social hierarchy seems like it would make it even MORE resistant to, say, the wide-scale adoption of prediction markets than the US is. (Though maybe the ruling party is enough of an unincentivized incentivizer to overcome this effect?) I suspect that it is even less likely to generate the kind of iconoclastic thinkers who would think up the idea of prediction markets in the first place. It would be quite bad if we got some boost in civilizational sanity with the rise of China, but that Chinese dominance curtainald any further improvement on that dimension. 
  4. It is currently unclear to me how much it matters which culture the intelligence explosion takes place in.
    1. Under the assumption of a strong attractor in the human CEV, it seems like it doesn’t matter much at all: we’re all, currently, so radically confused about Goodness, that the apparently-huge cultural differences are just noise. And even if that’s not true, I would guess that the differences between my ideal future, and some human descended society, are probably massively outweighed by the looming probability of extinction and a sterile universe. Chinese people live happy lives in China, now, and have lived happy lives throughout history, even if they tolerate a level of conformity and restriction-of-expression that I would find stifling, to say the least.
    2. However, I think it might not be an exaggeration to say that the CPC believes that thoughts should be censored to serve the state. I can imagine technological augmented versions of thought control that are so severe as to permanently damage the human civilization’s ability to think together, which might constitute the sort of irreparable “damage” that prevent us from deliberating to discover and the executing on a good future. If this sort of technology is more likely to come from China than from the west, Chinese supremacy might be disastrous.
    3. It does seem really important that AGI not lock the future into an inescapable immortal dictatorship (Probably? Maybe most people just live basically happy lives in an immortal dictatorship?). And I want to track if that is more likely to result from an intelligence explosion directed by China than by my native culture.

Summing up

  • The problem facing humanity in this era is figuring out how to exit the acute risk period, systematically, instead of by luck. 
  • The only ways that I can see to do this, depend on aligned AI or a much saner human civilization. 
  • So the problem breaks down into two subproblems: solve AI alignment or achieve enough civilizational sanity.
  • AI alignment research is going apace, and if there are ways to speed it up, that would be great.
  • I can currently see four lines of attack on civilizational sanity: unblocking innovation in governance, intelligence enhancement, and possibly widespread trauma resolution, or Chineses ascendancy. 
  • All of those plans might turn out to be on-net bad for the world, on further reflection.


Some of my questions for going forward:

  1. How long until transformative AI arrives?
  2. Are there tractable ways to speed Technical AI alignment substantially?
  3. Are there tractable ways to unblock governance experimentation?
  4. Follow up on charter city projects
  5. What’s blocking sea steading? Is it cost as I believe?
  6. How large are the expected flow-through effects of governmental sanity interventions on other sectors?
  7. Conditional on unblocking innovation in governance, how long is it likely to take for the best innovations to propagate outward until they are standard best practices?
  8. What’s the bottleneck for human genetic intelligence augmentation
  9. Along what dimensions would Nurolink improve human capabilities?
  10. Is “trauma” a natural kind? To what extent is it true that psychological trauma is driving exploitative and counter-productive organizational patterns in the world?
  11. How much saner is China? How long will the Chinese system remain “alive”?
  12. How different will the long term future be, if the intelligence explosion happens in one culture rather than another?


[1] –  Or some civilization or other mechanism, bearing human values.

[2] –  Though of course, there isn’t a linear relationship between the individual progress bars, and total victory. We might be “70%” of the way to a full solution to both problems (whatever that means), but between the two, not have enough of the right pieces to get a combined solution that lets us exit the critical risk period. That’s why it is only allegorical.

[3] – And, in contrast, I sometimes talk with people who are so pessimistic about alignment work, that they take it for granted that the thing to do is take over the world by conventional means.

Psychoanalyzing, people seem to gravitate to the line of attack that is within their skillset, and therefore feels more comfortable to think about. This seems like a perfectly good heuristic for specialization, but it doesn’t seem like a particularly good way to identify which approach is more tractable in the abstract.

How do we prepare for final crunch time? – Some initial thoughts

[epistemic status: Brainstorming and first draft thoughts.

Inspired by something that Ruby Bloom wrote and the Paul Christiano episode of the 80,000 hours podcast.]

One claim I sometimes hear about AI alignment [paraphrase]:

“It is really hard to know what sorts of AI alignment work are good, this far out from transformative AI. As we get closer, we’ll have a clearer sense of what AGI / Transformative AI is likely to actually look like, and we’ll have much better traction on what kind of alignment work to do. In fact, it might be the case that MOST of the work of AI alignment is done in the final few years before AGI, when we’ve solved most of the hard capabilities problems already and we can work directly, with good feedback loops, on the sorts of systems that we want to align.”

Usually this is taken to mean that the alignment research that is being done today is primarily to enable or make easier future, more critical, alignment work. But “progress in the field” is only one dimension to consider in boosting the work of alignment researchers in final crunch time.

In this post I want to take the above posit seriously, and consider the implications. If most of the alignment work that will be done is going to be done in the final few years before the deadline, our job in 2021 is mostly to do everything that we can to enable the people working on the problem in the crucial period (which might be us, or our successors, or both) so that they are as well equipped as we can possibly make them.

What are all the ways that we can think of that we can prepare now, for our eventual final exam? What should we be investing in, to improve our efficacy in those final, crucial, years?

The following are some ideas.


For this to matter, our alignment researchers need to be at the cutting edge of AI capabilities, and they need to be positioned such that their work can actually be incorporated into AI systems as they are deployed.

A different kind of work

Most current AI alignment work is pretty abstract and theoretical, for two reasons. 

The first reason is a philosophical / methodological claim: There’s a fundamental “nearest unblocked strategy” / overfitting problem. Patches that correct clear and obvious alignment failures are unlikely to generalize fully, you’ll only have constrained unaligned optimization to channels that you can’t recognize. For this reason, some claim, we need to have an extremely robust, theoretical understanding of intelligence and alignment, ideally at the level of proofs.

The second reason is a practical consideration: we just don’t have powerful AI systems to work with, so there isn’t much in the way of tinkering and getting feedback.

The second objection becomes less relevant in final crunch time: in this scenario, we’ll have powerful systems 1) that will be built along the same lines as the systems that it is crucial to align and 2)  that will have enough intellectual capability to pose at least semi-realistic “creative” alignment failures (ie, current systems are so dumb, and liven in such constrained environments, that it isn’t clear how much we can learn about aligning literal superintelligences from them.)

And even if the first objection ultimately holds, theoretical understanding often (usually?) follows from practical engineering proficiency. It seems like it might be a fruitful path to tinker with semi-powerful systems trying out different alignment approaches empirically, and tinkering to discover new approaches, and then backing up to do robust theory-building given much richer data about what seems to work.

I could imagine sophisticated setups that enable this kind of tinkering and theory building. For instance, I imagine a setup that includes:

  • A “sandbox” that afford easy implementation of many different AI architectures and custom combinations of architectures, with a wide variety easy-to-create, easy-to-adjust, training schemes, and a full suite of interpretability tools. We could quickly try out different safety schemes, in different distributions, and observe what kinds of cognition and behavior result.
  • A meta AI that observes the sandbox, and all of the experiments therein, to learn general principles of alignment. We could use interpretability tools to use this AI as a “microscope” on the AI alignment problem itself, abstracting out patterns and dynamics that we couldn’t easily have teased out with only our own brains. This meta system might also play some role in designing the experiments to run in the sandbox, to allow it to get the best data to test it’s hypotheses.
  • A theorem prover that would formalize the properties and implications of those general alignment principles, to give us crisply specified alignment criteria by which we can evaluate AI designs.

Obviously, working with a full system like this is quite different than abstract, purely theoretical work on decision theory or logical uncertainty. It is closer to the sort of experiments that the OpenAI and Deep Mind safety teams have published, but even that is a pretty far cry from the kind of rapid-feedback tinkering that I’m pointing at here.

Given that the kind of work that leads to research progress might be very different in final crunch time than it is now, it seems worth trying to forecast what shape that work will take and trying to see if there are ways to practice doing that kind of work before final crunch time.


Obviously, when we get to final crunch time, we don’t want to have to spend any time studying fields that we could have studied in the lead-up years. We want to have already learned all the information and ways of thinking that we’ll want to know, then. It seems worth considering what fields we’ll wish we had known when time comes.

The obvious contenders:

  • Machine Learning
  • Machine Learning interpretability
  • All the Math of Intelligence that humanity has yet amassed [Probability theory Causality, etc.]

Some less obvious possibilities:

  • Neuroscience?
  • Geopolitics, if it turns out that which technical approach is ideal hinges on important facts about the balance of power?
  • Computer security?
  • Mechanism design in general?

Research methodology / Scientific “rationality”

We want the research teams tackling this problem in final crunch time to have the best scientific methodology and the best cognitive tools / habits for making research progress, that we can manage to provide them.

This maybe includes skills or methods in the domains of:

  • Ways to notice as early as possible if you’re following an ultimately-fruitless research path
  • Noticing / Resolving /Avoiding blindspots
  • Effective research teams
  • Original seeing / overcoming theory blindness / hypothesis generation
  • ???


One obvious thing is to spend time now, investing in habits and strategies for effective productivity. It seems senseless to waste precious hours in our acute crunch time due to procrastination or poor sleep. It is well worth in to solve those problems now. But aside from the general suggestion to get your shit in order and develop good habits now I can think of two more specific things that seem good to do.

Practice no-cost-too-large productive periods

There maybe trades that could make people more productive on the margin, but are too expensive in regular life. For instance, I think that I might conceivably benefit from having a dedicated person who’s job is to always be near me, so that I can duck with them with 0 friction. I’ve experimented a little bit with similar ideas (like having a list of people on call to duck with), but it doesn’t seem worth it for me to pay a whole extra person-salary to have the person be on call, and in the same building, instead of on-call via zoom.

But it is worth it at final crunch time.

It might be worth it to spend some period of time, maybe a week, maybe a month, every year, optimizing unrestrainedly for research productivity, with no heed to cost at all, so that we can practice how to do that. This is possibly a good thing to do anyway, because it might uncover trades that actually, on reflection are worth importing into my regular life.

Optimize rest

One particular subset of personal productivity, that jumps out at me: each person should figure out their actual optimal cadence of rest.

There’s a failure mode that ambitious people commonly fall into, which is working past the point when marginal hours of work are negative. When the whole cosmic endowment is on the line, there will be a natural temptation to push yourself to work as hard as you can, and forgo rest. Obviously, this is a mistake. Rest isn’t just a luxury: it is one of the inputs to productive work.

There is a second level of this error in which one, grudgingly, takes the minimal amount of rest time, and gets back to work. But the amount of rest time required to stay functional is not the optimal amount of rest, the amount the maximizes productive output. Eliezer mused years ago, that he felt kind of guilty about it, but maybe he should actually take two days off between research days, because the quality of his research seemed better on days when he happened to have had two rest days preceding.

In final crunch time, we want everyone to be resting the optimal amount that actually maximizes area under the curve, not the one that maximizes work-hours. We should do binary search now, to figure out what the optimum is.

Also, obviously, we should explore to discover highly effective methods of rest, instead of doing whatever random things seem good (unless, as it turns out, “whatever random thing seems good” is actually the best way to rest).

Picking up new tools

One thing that will be happening in this time, is there will be a flurry of new AI tools that can radically transform thinking and research, perhaps increasingly radical tools coming at a rate of once a month or faster.

Being able to take advantage of those tools and start using them for research immediately, with minimal learning curve, seems extremely high leverage.

If there are things that we can do that increase the ease of picking up new tools and using them to their full potential (instead of, as is common, using only the features afforded by your old tools and only very gradually

Some thoughts (probably bad):

  • Could we set up our workflows, somehow, such that it is easy to integrate new tools into them? Like if you already have a flexible, expressive research interface (something like Roam?), and you’re used to regular changes in capability to the backed of the interface?
  • Can we just practice? Can we have a competitive game of introducing new tools, and trying to orient to them and figure out how to exploit them creatively as possible?
  • Probably it should be some people’s full time job to translate cutting edge developments in AI into useful tools and practical workflows, and then to teach those workflows to the researchers?
  • Can we design a meta-tool that helps us figure out how to exploit new tools? Is it possible to train an AI assistant specifically for helping us get the most out of our new AI tools?
  • Can we map out the sort of constraints on human thinking and/or the the sorts of tools that will be possible, in advance, so that we can practice with much weaker versions of those tools, and get a sense of how we would use them, so that we’re ready when they arrive?
  • Can we try out new tools on psychedelics, to boost neuroplasticity? Is there some other way to temporarily weaken our neural priors? Maybe some kind of training in original seeing?

Staying grounded and stable in spite of the stakes

Obviously, being one of the few hundred people on whom the whole future of the cosmos rests, while the singularity is happening around you, and you are confronted with the stark reality of how doomed we are, is scary and disorienting and destabilizing.

I imagine that that induces all kinds of psychological pressures, that might find release in any of a number of concerning outlets: by deluding one’s self about the situation, by becoming manic and frenetic, by sinking into immovable depression.

We need our people to have the virtue of being able to look the problem in the eye, with all of its terror and disorientation, and stay stable enough to make tough calls, and make them sanely.

We’re called to cultivate a virtue (or maybe a set of virtues) of which I don’t know the true name, but which involve courage and groundless, and determination-without-denial.

I don’t know what is entailed in cultivating that virtue. Perhaps meditation? Maybe testing one’s self at literal risk to one’s life? I would guess that people in other times and places, who needed to face risk to their own lives and that of their families, did have this virtue, or some part of it, and it might be fruitful to investigate those cultures and how that virtue was cultivated.


Tinder hookups displace hookups that are more likely to lead to relationships?

[epistemic status: completely unverified hypothesis straight out of my ass. Many of these “facts” are subjective impressions that may turn out to be just untrue. Very sloppy fact checking.]

According to the Atlantic, we’re currently in the midst of a sex recession. Fewer people are having sex, and those that are are having less of it, in comparison to previous decades.

Furthermore, it looks to me that many in my generation are not on track to have kids and raise a family. [Note: that might be totally false. For one thing, people are marrying and having kids later, so maybe I just need to wait a half decade. There are articles saying that the birthrate is falling, but just eyeballing the graph, it looks like it has been hovering around 2 births per woman since 1972.]

I have the impression that fewer romantic pairings are happening. Fewer people are ending up in romantic relationships. Why might that be?

In a word, Tinder.

In 2012, Tinder was launched, and by the mid 2010s it had reached fixation. I posit that maybe swipe-based mobile apps had a number of large scale societal impacts that we’re just starting to see. Namely, that hook-up pairings that result from tinder like apps, are less likely to lead to long term relationships.

Pairing with people in your local social context increases the foder for a robust long term relationship

Before tinder, people hooked up with people in their local social environment: you met people in your dorm, or in the classes you were taking, or at your workplace, or in a bar, or at a party, or through friends.

But it seems that the popularity of tinder-like apps must have displaced at least some of that activity. Now, if you want to hook up, you’re more likely to do it via an app.

I would guess that if you hook up with someone who lives in the same dorm as you, that hookup is much more likely to transition to an ongoing relationship. If you’re hooking up with someone that you’ve already interacted with a good deal, there’s the possibility of being attracted to a partner on the basis of attributes like shared interests, or positive character traits like generosity or humor. In contrast, on Tinder, approximately the only criteria for mate-evaluation are 1) looks (as embodied in a photo) and 2) chat game.

Hooking up with someone that you like from your interactions with them seems much more likely to lead to a long term relationship, compared to hooking up with someone almost solely on the basis of their photo. Having met in person, the two of you are more likely to share a lot of common context (similar interests, overlapping social group, similar priorities), which can turn a recurring hookup into an actual relationship.

Heavy power laws of sexual success push against relationships

Secondly, tinder aggravates a power law distribution of male sexual success. There have always been “Chads”, who were particularly attractive to women. Men are, in general, less discriminating about sexual partners than women are, so those men would have casual sex with many partners. Many women are pairing with a few men, resulting in a sex-pairings graph with a small number of super-connectors, and a larger number of unconnected or loosely connected nodes (less attractive-to-women men, who are having much less sex than the population average).

However, I suspect that tinder-like apps further consolidate that distribution of sexual success.

Tinder has a much larger pool to filter than in-person context. Women, flipping through tinder, can choose to mach with only the very most attractive guys. In contrast, if they were going to a bar to hook up, they would, at best, be able to hook up with only the most attractive guy at that particular bar. And even then, only one woman at a bar could pair with the most attractive man at that bar at any given time, whereas on Tinder, a very successful guy could match with multiple women in a day, and have sex with all of them in sequence.

Hookups with highly sexually-successful chads seem unlikely to transition to long term relationships, because for those men, the opportunity cost of monogamy is much larger.

Additionally, even if those men do want to transition to a long-term relationship, they would only do it with (approximately) one out of hundreds of women. So, for most women hooking up with a hyper-sexually-successful man, the likelihood of that hookup transitioning into a relationship is very low.

Which means that tinder allows women, as a whole, to hook up with the same number of men as women 2 decades ago, and on average, their partners are more attractive to them, but less likely to pair-bond with them in a long term relationship.

Since the transaction costs for having sex are lower when you’re already in a relationship, fewer people ending up in relationships means that there is less sex happening overall. And fewer relationships means fewer people getting married and having kids.

Some predictions that this model makes and other hypotheses


  • We should see that the power-law of sexual success for men has moved to be closer to winner-take-all since 2012. More men are having no sex or close to no sex, but the men who are having a lot of sex are having more of it than their peers-from-the-previous-cohort (is there a words for this?).
  • Fewer hookups are happening via the “traditional channels”
    • As a corollary of this, men must either be asking women out in person less, or women must be saying “yes” (to sex, if not to dates) less, or both.

Everything that I’m saying here is also compatible with the hypothesis:

“Most men and women mostly aim for casual sex in their 20s, and steer toward looking for longer term relationships and marriage partners in their 30s. Tinder has all of these impacts in the first phase, but doesn’t influence the second phase much.”

This is possible, I suppose, but given the common trope of couples that met in college, it seem like over the past 50 years, long term committed relationships have evolved from more casual relationships, which I think often start as hookups.

This tweet from the author of a paper about how people meet their partners does seem to match this story.

It looks like online dating is displacing “meeting through friends”, “meeting through / as coworkers”, and “meeting in college.”

Social reasoning about two clusters of smart people

Here’s a sketch. All of the following are generalizations, and some are wrong.

There are rationalists.

The rationalists are unusually intelligent, even I think, for the tech culture that is their sort of backdrop. But they are, by-and-large kind of aspy: on the whole, they are weak on social skills, or their is something broken about their social perceptions (broken in a different way for each one).

Rationalists rely heavily on explicit reasoning, and usually start their journeys pretty disconnected from their bodies.

They are strongly mistake theorists.

They have very very strong STEM-y epidemics. They can follow, and are compelled by arguments. They are masterful at weighing evidence and coming to good conclusions on uncertain questions, where the there is something like a data-set or academic evidence base.

They are honest.

They generally have a good deal of trust and assumption of good faith about other people, or they are cynical of humans and human behavior, using (explicit) models of “signaling” and “evo pysch.”

I think they maybe have a collective blindspot with regards to Power, and are maybe(?) gullible (related to the general assumption about good faith). I suspect that rationalists might find it hard to generate the hypothesis that “this real person right in front of me, right now, is lying to me / trying to manipulate me.”

They are, generally, concerned about ex-risk from advanced AI, and track that as the “most likely thing to kill us all”.


There’s also this other cluster of smart people. This includes Leverage-people, and some Thiel people, and some who call themseleves post rationalists.

They are more “humanities” leaning. They probably think that lots of classic philosophy is not only good, but practically useful (where some rationalists would be apt to deride that as the “rambling of dead fools”).

They are more likely to study history or sociology, than math or Machine Learning.

They are keenly aware of the importance of power and power relations, and are better able to take ideology as object, and treat speech as strategic action rather than mere representation of belief.

Their worldview emphasizes “skill”, and extremely skilled people, who shape the world.

They are more likely to think of “beliefs” as having a proper function doing something other than reflecting the true state of the world, for instance, facilitating coordination, or producing an effective psychology. The rationalist would think of instrumentally useful false beliefs as something that is kind of dirty.

They tend to get some factual questions wrong (as near as I can tell): one common one is disregarding IQ, and positing that all mental abilities are a matter of learning.

These people are much more likely to think that institutional decay or civilizational collapse is more pressing than AI.


It seems like both these groups have blindspots, but I would really like to have a better sense of the likelyhood of both of these disasters, so it would be good if we could get all the virtues into one place, to look at both of them.



A view of the main kinds of problems facing us

I’ve decided that I want to to make more of a point to write down my macro-strategic thoughts, because writing things down often produces new insights and refinements, and so that other folks can engage with.

This is one frame or lens that I tend to think with a lot. This might be more of a lens or a model-let than a full break-down.

There are two broad classes of problems that we need to solve: we have some pre-paradigmatic science to figure out, and we have have the problem of civilizational sanity.

Preparadigmatic science

There are a number of hard scientific or scientific-philosophical problems that we’re facing down as a species.

Most notably, the problem of AI alignment, but also finding technical solutions to various risks caused by bio-techinlogy, possibly getting our bearings with regards to what civilization collapse means and how it is likely to come about, possibly getting a handle on the risk of a simulation shut-down, possibly making sense of the large scale cultural, political, cognitive shifts that are likely to follow from new technologies that disrupt existing social systems (like VR?).

Basically, for every x-risk, and every big shift to human civilization, there is work to be done even making sense of the situation, and framing the problem.

As this work progresses it eventually transitions into incremental science / engineering, as the problems are clarified and specified, and the good methodologies for attacking those problems solidify.

(Work on bio-risk, might already be in this phase. And I think that work towards human genetic enhancement is basically incremental science.)

To my rough intuitions, it seems like these problems, in order of pressingness are:

  1. AI alignment
  2. Bio-risk
  3. Human genetic enhancement
  4. Social, political, civilizational collapse

…where that ranking is mostly determined by which one will have a very large impact on the world first.

So there’s the object-level work of just trying to make progress on these puzzles, plus a bunch of support work for doing that object level work.

The support work includes

  • Operations that makes the research machines run (ex: MIRI ops)
  • Recruitment (and acclimation) of people who can do this kind of work (ex: CFAR)
  • Creating and maintaining infrastructure that enables intellectually fruitful conversations (ex: LessWrong)
  • Developing methodology for making progress on the problems (ex: CFAR, a little, but in practice I think that this basically has to be done by the people trying to do the object level work.)
  • Other stuff.

So we have a whole ecosystem of folks who are supporting this preparadgimatic development.

Civilizational Sanity

I think that in most worlds, if we completely succeeded at the pre-paradigmatic science, and the incremental science and engineering that follows it, the world still wouldn’t be saved.

Broadly, one way or the other, there are huge technological and social changes heading our way, and human decision makers are going to decide how to respond to those changes, possibly in ways that will have very long term repercussions on the trajectory of earth-originating life.

As a central example, if we more-or-less-completly solved AI alignment, from a full theory of agent-foundations, all the way down to the specific implementation, we would still find ourselves in a world, where humanity has attained god-like power over the universe, which we could very well abuse, and end up with a much much worse future than we might otherwise have had. And by default, I don’t expect humanity to refrain from using new capabilities rashly and unwisely.

Completely solving alignment does give us a big leg up on this problem, because we’ll have the aid of superintelligent assistants in our decision making, or we might just have an AI system implement our CEV in classic fashion.

I would say that “aligned superintelligent assistants” and “AIs implementing CEV”, are civilizational sanity interventions: technologies or institutions that help humanity’s high level decision-makers to make wise decisions in response to huge changes that, by default, they will not comprehend.

I gave some examples of possible Civ Sanity interventions here.

Also, think that some forms of governance / policy work that OpenPhil, OpenAI, and FHI have done, count as part of this category, though I want to cleanly distinguish between pushing for object-level policy proposals that you’ve already figured out, and instantiating systems that make it more likely that good policies will be reached and acted upon in general.

Overall, this class of interventions seems neglected by our community, compared to doing and supporting preparadigmatic research. That might be justified. There’s reason to think that we are well equipped to make progress on hard important research problems, but changing the way the world works, seems like it might be harder on some absolute scale, or less suited to our abilities.





Why is the media consumption of adult millennials the same as it was when they were children?

[Random musings.]

Recently, I’ve seen ads for number of TV shows that are re-instantiations of TV shows from the the early 2000s, apparently targeted at at people in their late twenties and early thirties, today.

For instance, there’s a new Lizzie Mcguire show, that follows a 30-year-old Lizzie as a practicing lawyer. (In the original show, she was a teenager in high school.) In a similar vein, there’s a new That’s So Raven Show, about Raven being a mom.

Also, recently, Disney released a final season of Star Wars the Clone Wars (which ran from 2008 to 2014).

These examples seem really interesting to me, because this seem like a new phenomenon. Something like, Millennials unironically like and are excited about the same media that they liked when they were kids. I think think this is new. My impression is that it would be extremely unusual for a 30 year-old in 1990, to show similar enthusiasm for the media they consumed as a 12 year old. I imagine that for that person there is a narrative that you are supposed to “grow out of childish things”, and a person who doesn’t do that is worthy of suspicion. (Though I wasn’t there in 1990, so maybe I’m miss-modeling this.)

My impression (which is maybe mistaken), is that Millennials did not “grow up” in the sense that earlier generations did. Instead of abandoning their childhood interests to consume “adult media”, they maintained their childhood interests into their 30s. What could be going on here?

  • (One thing to note is that all three of the examples that I gave above are not just Disney properties, but specifically Disney+ shows. Maybe this is a Disney thing, as opposed to a Millennial thing?)

Some hypotheses:

  • One theory is that in the streaming era, demographics are much more fragmented, and there is an explosion of content creation for every possible niche, instead of aiming for broad appeal. So while there always would have been some people who are still excited about the content from their childhood, now media companies are catering to that desire, in order to capture that small demographic.
  • Another possibility is that the internet allowed for self-sustaining fandoms. In the past, if you liked a thing, at best you could talk about it with your friends, until that content ended and your friends moved on. But with the internet, you could go on message boards, and youtube, and reddit, and be excited about the things you love, with other people who love those things, even decades after they aired. The internet keeps your childhood fresh and alive for you, in a way that wasn’t really possible for previous generations.
  • Maybe being a geek became destigmatized. I think there is one group of adults in 1990 that would be unironically excited about the content that they enjoyed as kids and teen-agers: Nerds, who still love Star Wars, or Star Trek, or comic books, or whatever. (I posit that this is because nerds tend to like things because of how natively cool they seem, which is pretty stable over a lifetime, as opposed to tracking the Keynesian beauty contest of which things are popular with the zeitgeist / which things are cool to like, which fluctuates a lot over years and decades.) For some reason (probably related to the above bullet point), being a geek became a lot less socially stigmatized over the early 2000s, and there was less social backlash for liking nerdy things, and for being unironically excited about content that was made for children.
    • I feel like there is deeply related to sex. I posit that the reason that most young men “grow out of childish things”, is that when they become interested in girls, they start to focus near-exclusively on getting laid, and childish interests are a liability to that. (Nerds either 1) care more about the things that they like, so that they are less willing to give them up, even for sex or 2) are more oblivious of the impact that their interests have on their prospects for getting laid). But I have the sense that unironically liking your childhood media is less of a liability to your sex-life in 2000, than it was in 1990, for reasons that are unclear.
    • (Again, maybe it is because the internet allows people to live in communities that that also appreciate that media, or maybe because nerds provided a ton of social value and can get rich and successful, so being a nerd is less stigmatized on the dating market, or maybe because special effects got so good that the things that were cool to nerds are now more obviously cool to everyone (eg superhero movies have mass appeal).
  • Maybe the content from the early 2000s is just better, in some objective sense, than the content of the 1970s – 1980s. Like maybe my dad grew out of the content that he watched as a kid, because it was just less sophisticated, where as the content that my generation watched as kids, is more interesting to adults?
  • Maybe the baby boomers had an exciting adult world to grow into, which was more compelling than their childhood interests. Millennials feel adrift in the world, and so default to the media they liked as kids, because they don’t have better things to do?


Impressions on what is happening in the US in this decade

Epistemic status: wild-eyed inside view impressions, based on narrative and stereotype, mostly devoid of hard facts. I expect hard facts to change my view. [I also note that this story basically accords with my grey-tribe ideology / worldview.]

Over the past couple of weeks, I’ve started following twitter for the first time, and (relatedly) I’ve been reading a lot of Venkatesh Rao’s writing, which is mostly new to me. This combined with the fact that history in 2020 seems to be “moving faster” than it used to, has caused me to start thinking about society, and social class, and some other topics that I’ve not thought much about before.

This shitty post represents an outline of my current, tentative, view of what the heck is going on in the United States these days. As noted in the epistemic status, this synthesis is based on my subjective impressions watching the world unfold, more than rigorous analysis, and accordingly I’m missing a lot of complexity, at best, and totally off base at worst. But nevertheless, this kind of unrigorus making sense of things seems like a good starting point, to guide my empiricism.

Optimism and social stability

There’s a lot of talk about inequality, and the social unrest that income inequality foments. But I think that inequality of income, or of wealth, is a little bit of a red herring. I posit that social unrest mostly stems from pessimism about one’s personal life outcomes.

When folks expect their lot in life to improve, decade by decade, and when they have an expectation that if they work hard, their children can enjoy a better standard of living than they do, they’re pretty content. So long as that is true, they don’t care that much if other people are wealthier than them (especially people who live far away, but I think this is also true of literal neighbors).

Upsetting the social order is risky and dangerous, and is therefore an action of last resort. Most people don’t actually like violence and will avoid it if they can. If there are opportunities for social advancement, they’ll take them.

However, if those opportunities are lacking, for some reason or another, and people feel like they can’t get ahead, they’ll feel frustrated. They have a sense that something unfair is happening, and are likely to adopt the mindset of the pie fallacy: there’s only so much to go around, and one person’s wealth implies another person’s poverty.

In fact, while the pie-fallacy is a fallacy in the general case, the 0-sum mindset is a pretty accurate summary of one’s situation when there are not opportunities to improve your lot in life. Conditioning on that premise, your wellbeing does actually tradeoff against other people’s resources. If there’s some blocker to creating new value, the only thing left is redistribution, either by legal means like taxation, or illegal, like outright revolution.

I think that it is this frustration, of a lack of upward social mobility, much more than inequality specifically, that leads to revolution or attempts at redistribution.

Unfortunately, I think that a huge swaths of Americans actually find themselves in this position, of being unable to improve their lot. The American dream, that you can work hard for most of your life, so that your kids can have a better life than you feels like an empty promise.

In the red-tribe side of things, this looks like the factory worker in the rust belt, whose job was outsourced to a foreign country, leaving him without a clear away to support himself. And as the economic centers of his world dried up, he watched as his community became a shadow of its former vibrant self. For him, the American dream seems to have gone wrong somehow. Somehow he’s worse off than his father was.

On the blue-tribe side of things, this is embodied by the millennial in one of the costal mega-cities, working as part of the gig economy or in dismal retail, who is saddled with enormous college debt,  which somehow didn’t give her much career capital, and who is paying exorbitant rent to for a small room in a shared apartment, or, alternatively, still living with her parents. A child of the nineties, she thought, and her parents thought, that if she worked hard and went to a good college, the world was her oyster. But somehow it doesn’t seem to be working out that way.

I imagine that both of these people feel like they are running in place: they were promised that if they worked hard, they could have a good life. But instead they’re treading water.


Why are we here? It seems like for a number of decades standards of living were rising, and people were doing better than their parents. Why did the American dream stop working?

I think that it is the result of a number of factors.

First, we need to keep in mind that “having a better standard of living than your parents did” was much easier for the Baby Boomers, because that was a lower bar to beat. The Boomer were born right after the Great Depression, and many of them were first or second generation immigrants. If your mom and dad walked off the boat with literally $10 in their pocket, and worked their way up from there, or if they were impoverished farmers whose crops failed in the dust bowl, it is a lot more likely that you’ll end up better off then them.

Second is the great stagnation. The boomers benefited from one of the biggest surges in economic growth in history, driven by the mass deployment of new technology, which created new industries. Riding that wave, life got better for a lot of people. If true growth is much slower now, less benefit will accrue to individuals.

But the biggest thing is that the new economy is exclusionary. Information technology and it’s derivatives, the industries in which almost all the growth we do have is concentrated, is fundamentally about scale, which means that outcomes are compressed into a power-law.

In one of my favorite essays of all time, How to Make Wealth, Paul Graham says, in order to make a lot of wealth, you need to have leverage, in the sense that your decisions scale. And that one of the best ways of getting leverage, is by developing technology.

What is technology? It’s technique. It’s the way we all do things. And when you discover a new way to do things, its value is multiplied by all the people who use it. It is the proverbial fishing rod, rather than the fish. That’s the difference between a startup and a restaurant or a barber shop. You fry eggs or cut hair one customer at a time. Whereas if you solve a technical problem that a lot of people care about, you help everyone who uses your solution. That’s leverage.

Doing work with computers, means that you have 0-marginal cost, which means that if you solve a problem once, you’ve solved it an arbitrarily-large number of times, so you can create a huge amount of value in one go.

But there’s a dark side of leverage, which is that it tends to shift the world towards winner-take all dynamics. A few people who are well suited to this new world (possessing an extreme entrepreneurial mindset, or technical chops, or excellent sales skills, etc.) thrive, producing huge amounts of value. But for the Americans that don’t fit that template, the work-a-day opportunities are becoming less and less attractive, because those roles provide less and less marginal value.

When the marginal costs of goods was non-0, that had a flattening effect on the power law, because even if you couldn’t design a ford car, you provided value to the final product by working on the assembly line. But today, the marginal cost of most everything in the new economy is 0.

Approximately speaking, everyone has been thrust, whether they like it or not, from Taleb’s Mediocristan into Extremistan. More of progress is dominated by a few super-winners, compared to a generation ago, when progress was, though still on a power law, more equitable.

[I feel like this doesn’t quite explain it. Like sure, wages are stagnating. But if costs were falling faster than wages, then in wouldn’t matter: you would be doing better year by year, just by dint of falling cost of living.

So part of the story is that, for some reason, the rents are too damn high, such that people have trouble just getting by.

Why’s that?

Maybe it has something to do with the fact that almost all of the economic opportunities are concentrated in a few mega-cities? You either have to live there, or be resigned to watching the world decay around you?

Does that mean that if zoning were different, so that there were ten times the number of apartments in SF, NYC, Chicago, DC, and Austin, this problem would basically go away, and people could basically rise with the rising tide of the economy? Or if the pandemic causes the world to decentralize, and people can pretty much live and work from anywhere in the world?

I guess I would be surprised if that were the result of either of those changes. Maybe because there are forces that are preying on people and therefore driving net income to some set point above subsistence, but below personal net-growth?

It might be that the high cost of living is a consequence of the huge disparities in production power: there are the heroes / robber barons of the new economy, who are mostly living in a few mega cities, and almost all the the wealth flows from there, so you have to be in the mega-cities to have access to any of the opportunities?

Also, don’t forget about the college debt which is part of what makes the cost of living high, and which has been driven up by other factors.]


Whatever the cause, I now suspect that there are a lot of people that feel like whatever they do, there’s no way for them to get ahead. They feel, justifiably, disenfranchised.

It’s natural, in a situation where you feel trapped, to try and make sense of what is happening to you. There are a number of narratives that people adopt to orient to the situation that they find themselves in. But I think the two biggest ones are Trumpism, and Wokeism.

The archetypal disenfranchised red-triber is frustrated and confused that somehow things are getting worse for him and his peers. The literal person of Donald Trump exploits those feelings, and provides a narrative that America has been taken advantage of by other countries (for instance, Mexico, feeding us illegal immigrants, and China, who is cheating in trade deals), due to something like weakness on the part of the establishment and the coastal elites (who in any case, seem to alienate him or condescend to him at every turn). Under this narrative, these trends need to be reversed, to make America Great Again ™.

The archetypal millennial blue-triber, in contrast, feels frustrated and confused about how she can’t seem to get ahead. She buys into a narrative that the reason for this is systemic oppression: the problem is racism, the patriarchy, and capitalism, and broadly systems that serve their own interests by exploiting weaker groups. Under this story, what is needed is an overhaul of those oppressive systems.

Note of course that this sort of disenfranchised frustration isn’t the only thing feeding these two ideologies. Lots of people who, personally, have an optimistic trajectory might buy into or act in alignment with either one, for all the usual reasons, from virtue signaling, to tribal mimesis, to thinking seriously about the problems and their origins and coming to the conclusion that the Trumpist or the Wokists are basically correct.

But, I posit that the energy that is driving both movements comes from large swaths of people who feel like, whatever they do, they can’t get ahead. If there were ample individual economic opportunity, people wouldn’t care nearly as much about those problems.

Social media

On top of this, we introduced a new technology to the masses over the past decade and a half. Our social and political lives are organized by twitter and facebook.

This has a divisive, echo-chamber enforcing, effect. The Liberals and the Conservatives have always been at each others throats, railing against the other as an outright evil. But I imagine that there was something very different about the world when everyone watched Walter Cronkite on the news every night. There were strong, emotional disagreements, but everyone was, to a large extent, living in the same world.

No longer. Now, the media landscape is fractured, and virtually every event is spun to support one ideology or another. And now almost everyone has a voice in the conversation, due to social media, where before, there were defacto gate-keeper institutions.

Much more than we used to I think, we live in one of a number of parallel worlds that are layered on top of each other. My guess is that this aggravates tensions, by reinforcing one’s existing narrative. Then, because of the natural memetic incentives, this results in extreme othering and demonization of anyone with a meaningfully different set of priorities.

[It seems like there’s a lot more here, that it might be worth understanding in detail.]

Also, our institutions are falling apart, in the sense of loosing their legitimacy, and in the sense of losing their ability to function in pretty basic ways, for related reasons. Basically, it seems like in this cultural battlefield, every institution has to pick a side. This lack of space for “neutrality” edges out the possibility of non-ideological sense-making. (I’m not satisfied with that explanation. I feel like there is a much more detailed story of the incentives that have torn apart news papers and the universities, for instance.


On top of all this, we have a pandemic, itself litteraly-insanely politicized, forced a lot of people into enclosed spaces with minimal social contact, for months, and waited we went collectively stir-crazy.

Powder keg

Overall, this reads like a pretty concerning setup. I don’t have a good intuition for how robust the substructure of our world is, but it certainly seems like it is being tested more than any other time in my extremely short lifetime.

Seems maybe bad?

Values are extrapolated urges

[Epistemic status: a non-rigorous theory, representing my actual belief about how it works.]

Related to: Value Differences As Differently Crystallized Metaphysical Heuristics

In this post I want to outline my understanding of what “values” are, at least for human beings. This idea or something very like it may already have standard terminology in academic philosophy, in which case, I would appreciate being pointed to the relevant references. This may be obvious, but I want to say it to lay the groundwork for a puzzle that I want to talk about in the next post.

Basically, I posit that “values”, in the case of human beings, are crystallization or abstractions of simple response patterns.

[Google doc version for commenting]

Abstracting values from reactions

All animals, have a huge suite of automatic reactions to stimuli, both behavioral and affective, and both learned and hard coded.

When thirsty and near water, a lion will drink. When a rabbit detects a predator, they’ll freeze (and panic?). When in heat (and the opportunity presents itself) a giraffe will  copulate. When his human comes home, a dog will wag its tail and run up in greeting, presumably in a state of excited happiness. [I note that all of my examples are off mammals.]

Some of these behaviors might be pretty complex, but their basic structure is TAP-like: something happens, and there is some response in physiology of the animal. I’m going to call this category of “contextualized behaviors and affects”, “urges”.

Humans understand language, which means that the range of situations that they can respond to is correspondingly vaster than most animals. For instance, a human might be triggered (have a specific kind of fear response) to another human making a speech act.

But that isn’t the main thing that differentiates humans in this context. The big difference between humans and most other animals, is that humans can abstract from a multitude of behaviors, to infer and crystallize the “latent intentionality” among those behaviors.

For instance, an early human can reason,

When I see a tiger, I run, and feel extreme overriding panic. If the tiger catches me, I’ll try to fight it. When a heavy rock falls from a cliff, and I hear it falling, I also have a moment of panic, and duck out of the way.

When I am hungry, I eat. When I am thirsty, I drink.

When other people in my tribe have died, I’ve felt sad, and sometimes angry.

…I guess I don’t want to die.

[edit 2022-06-01: More specifically, what’s going on is that the human simulates a bunch of possible scenarios in which he comes to harm or dies, and has a negatively-valenced (flee, retreat, resist) reaction to each one. He intuits the similarity between those scenarios, to abstract out general concepts of harm or death, and associativity learns a general negatively-valenced reaction to those outcomes. He develops a flee-retreat-resist response to anything that involves his dying. He ends up with a goal of “staying alive”. (By default, all of this happens non-verbally, and without any conscious reflection.)]

From each of these disparate, contextualized, urges-to-action and affective responses (which by the way, I posit are not two distinctly different things, but rather two ends of a spectrum), a person notices the common thread, “what do each of these behaviors seem to be aiming towards?”

And abstracting that goal, from the urges, he/she then “owns” it. He/she thinks of him/herself as an entity wanting, valuing, caring about that thing (rather than a bundle of TAPs, some of which are correlated).

My guess is that this abstraction operation is an application of primate (maybe earlier than primate?) social-modeling software to one’s self. It is too expensive to track all of the individual response behaviors of all of the members of your band, but fortunately, you can compress most of the information by modeling them not as adaption-executors, but as goal directed agents, and keeping track of their goals and their state of knowledge.

When one applies the same trick to one’s own behavior and mental states, one can compress a plethora of detail about a bunch of urges into a compact story about what you want. Wala. You’ve started running an ego, or a self.

This is the origin of “values.” Values are compressions / abstractions / inferences based on / extrapolated from a multitude of low level reactions to different situations.

I think that most animals can’t and don’t do this kind of inference. Chipmunks (I think) don’t have values. They have urges. Humans can, additionally, extrapolate their urges into  values.

I’m pretty sure that something like this process is how people come to their values (in the conventional sense of “the things they prioritize”) in real life.

For instance, I am triggered by claims and structures that I perceive as threats to my autonomy. I flinch away defensively. I think that this has shaped a lot of my personality, and choices, including leading me into prizing rationality.

Furthermore, I posit that something like this process is how people tend to adapt political ideologies. When someone hears about the idea of redistribution, and their visceral sense of that is someone taking things from them, they have a (maybe subtle) aversion / threatened feeling.* This discomfort gives rise to an urge to skepticism of the idea. And if such a person hangs out with a bunch of other people that have similar low-level reactions, eventually, it becomes common knowledge, and this becomes the seed of an ideology, that gets modified and reinforced by all the usual tribal mechanisms.

I think the same basic thing can happen when someone feels (probably less than consciously) threatened by all kinds of ideologies. And this + social mimesis is how people end up with “conservative values” or “liberal values” or “libertarian values” or what have you.

* – I have some model of how this works, the short version being, “stimuli trigger associated (a lot of the action here is in the association function) mental imagery, which gives rise to a valence,  which guides immediate action, modulo further, more consequentialist deliberation. In fact, you can learn to consciously catch glimpses of this happening.

Of course all of this is a simplification. Probably this process occurs hierarchically, where we abstract some goals from TAP-like urges, and then extrapolate more abstract goals from those, and so on until we get to the “top” (if it turns out that there is a “top”, as opposed to a cycle that has some tributaries that flow into it).

For that reason, the abstraction / crystallization / triangulation process is not deterministic. It is probably very path dependent. Two people with the exact same base level pattern of urges, in different contexts will probably grow into people with very different crystallized values.

Values influence behavior

Now a person might abstract out their values from their behavior in a way that is largely non-consequential. They model themselves, and describe themselves, in terms of their values, but that is just talk. The vast majority of their engagement in the world is still composed of the behaviors stemming from their urges in response to specific situations.

But, it also sometimes happens that abstracting out values, and modeling one’s self as an optimizer (or something like an optimizer) for those values, can substantially effect the level of behavior.

For one thing, having a shorthand description of what one cares about means that one one can use that description for deliberation. Now, when considering what to do in a situation, a person might follow a mental process that involves asking how they can achieve some cashed goal, instead of reflexively acting on the basis of the lower level urges that the goal was originally abstracted from.

This means that a person might end up acting in a way that is distinctly in opposition to those low level reactions.

For instance, a person might want status and respect, and they can feel the tug to go drink and socialize “with the guys” of their age group, but they instead stay home and study, because they reason that this will let them get a good job, which will let them get rich, which they equate with having a lot of status.

Or a person might take seriously that they don’t want to die, and sign up for cryonics, even though none of their urges recommended that particular action, and in fact, it flies in the face of their social conformity heuristics.

Furthermore, in this vein a person might notice inconsistencies between their professed values and the way they behave, or between multiple diverse values. And if they are of a logical turn of mind they may attempt to modify their own behavior to be more in line with their values. Thus we end up with moral striving (though moral striving might not be the only version of this dynamic).


Just to say this explicitly, humans, uniquely (I think? maybe some other animals also abstract their values), can examine some particular behavior or reaction and consider it to be a bug, a misfiring, where the system is failing to help them achieve their values.

For instance, I’m told that a frog will reflexively flick out it’s tongue to ensnare anything small and black that enters its field of vision. From the perspective of evolution, this is a bug: the behavior is “intended” for catching and eating flies, and eating bits of felt that human researchers throw in the air (or whatever) is not part of the behavior selected for.  [Note: in talking about what evolution “intended”, we’re executing the same mental move of abstracting goals and values from behavior. Evolution is just the fact of what happened to replicate, but we can extrapolate from a bunch of specific contextualized adaptions to reason about what evolution is “trying to do”.]

But, I claim here, that asking “is this behavior a bug, from the frog’s perspective?” is a mis-asked question, because the frog has not abstracted its values from its behaviors, in order to reflect back on its behaviors and judge them.

In the parallel case of a human masturbating, the human can abstract its values from his or her behavior, and could deem masturbation as a 1) bug, a dis-endorsed behavior that arises from a hormonal system that is partially implementing his or her values, but which misfires in this instance, or 2)  as an expression of what he/ she actually values, part of a life worth living.

(Now it might or might not be the case that only one of these options is reflexively stable. If only one of them is, for humans in general, there is still a meaningful sense in which one can be mistaken about which things are Good. That is a person can evaluate something as aligned with their values, but would come to think differently in the limit of reflection.)

A taxonomy of Cruxes

[crossposted to LessWrong]

This is a quick theoretical post.

In this post, I want to outline a few distinctions between different kinds of cruxes. Sometimes folks will find what seems to be a crux, but they feel some confusion, because it seems like it doesn’t fit the pattern that they’re familiar with, or it seems off somehow. Often this is because they’re familiar with one half of a dichotomy, but not the other.

Conjunctive, unitary, and disjunctive cruxes

As the Double Crux method is typically presented, double cruxes are described as single propositions, about which, if you changed your mind, you would change your mind about another belief.

But as people often ask,

“What if, there are two propositions, B and C, and I wouldn’t change my mind about A, if I just changed my mind about B, and I wouldn’t change my mind about A if I just changed my mind about C, and I would only if I change my mind about A, if I shift on both B and C?”

This is totally fine. In this situation would would just say that your crux for A is a conjunctive crux of B and C.

In fact, this is pretty common, because people often have more than one concern in any given situation.

Some examples:

  • Someone is thinking about quitting their job to start a business, but they will only pull the trigger if a) they thought that their new work would actually be more fulfilling for them, and b) they know that their family won’t suffer financial hardship.
  • A person is not interested in signing up for cryonics, but offers that they would if a) it was inexpensive (on the order of $50 a month and b) if the people associated with cryonics were the sort of people that he wanted to be identified with. [These are the stated cruxes of a real person that I had this discussion with.]
  • A person would go vegetarian if, a) they were sure it was healthy for them and b) doing so would actually reduce animal suffering (going a level deeper: how elastic is the supply curve for meat?).

In each of these cases there are multiple considerations, none of which is sufficient to cause one to change one’s mind, but which together represent a crux.

As I said, conjunctive cruxes are common, I will say that sometimes folks are too fast to assert that they would only change their mind if they turned out to be wrong about a large number of conjunctive terms.

When you find yourself in this position of only changing your mind on the basis of a large number of separate pieces, this is a flag that there may be a more unified crux that you’re missing.

In this situation I would back up and offer very “shallow” cruxes. Instead of engaging with all the detail of your model, instead look for a very high level / superficial summary, and check if that is a crux. Following a chain of many shallow cruxes is often easier than trying to get into the details of complicated models right off the bat.

(Alternatively, you might move into something more like consideration factoring.)

As a rule of thumb, the number of parts to a conjunction should be small: 2 is common, three is not that common. Having a 10 part conjunction is implausible. Most people can’t hold that many elements in their head all at once!

I’ve occasionally seen order of 10 part disjunctive arguments / conjunctive cruxes in technical papers, though I think it is correct to be suspicious of them. They’re often of the form “argument one is sufficient, but even if it fails, argument 2, is sufficient, and even that one fails…” But, errors are often correlated, and the arguments are likely not as independent as they may at first appear. It behooves you to identify the deep commonality between your lines of argument, the assumptions that multiple arguments are resting on, because then you can examine it directly. (Related to the “multiple stage fallacy‘).

Now of course, one could in principle have a disjunctive crux, where if they changed their mind about B or about C, they would change their mind about A. But, in that case there’s no need to bundle B and C. I would just say that B is a crux for A and also C is a crux for A.

Causal cruxes vs. evidential cruxes

A causal crux back-traces the causal arrow of your belief structure. They’re found by answering the question “why do I believe [x]?” or “what caused me to think [x] in the first place?” and checking if the answer is a crux.

For instance, someone is intuitively opposed to school uniforms. Introspecting on why they feel that way, they find that they’re expecting (or afraid that) that kind of conformity squashes creativity. They check if that’s a crux for them (“what if actually school uniforms don’t squash creativity?”), and find that it is: they would change their mind about school uniforms if they found that they were wrong about the impact on creativity.”

Causal cruxes trace back to the reason why you believe the proposition.

In contrast, an evidential crux is a proxy for your belief. You might find evidential cruxes by asking a question like “what could I observe, or find out, that would make me change my mind?”

For instance, (this one is from a real double crux conversation that happened at a training session I ran), two participants were disagreeing about whether advertising destroys value on net. Operationalizing, one of them stated that he’d change his mind if they realized that beer commercials, in particular, didn’t destroy value.

It wasn’t as if he believed that advertising is harmful because beer commercials destroy value. Rather it was that he thought that advertising for beer was a particularly strong example of the general trend that advertising is harmful. So if he changed his mind in that instance, where he was most confident, he expected that he would be compelled in the general case.

In this case “beer commercials” are serving as a proxy for “advertising.” If the proxy is well chosen, this can totally serve as a double crux. (It is, of course, possible that one will be convinced that they were mistaken about the proxy, in a way that doesn’t generalize to the underlying trend. But I don’t think that this is significantly more common than following a chain of cruxes down, resolving at the bottom, and then finding that the crux that you named was actually incomplete. In both cases, you move up as far as needed, adjust the crux (probably by adding a conjunctive term), and then traversing a new chain.)

Now, logically, these two kinds of cruxes both have the structure “If not B, then not A” (“if uniforms don’t squash creativity, then I wouldn’t be opposed to them anymore.” and “if I found that beer commercials in fact do create value, then I would think that advertising doesn’t destroy value on net”). In that sense they are equivalent.

But psychologically, causal cruxes traverse deeper into one’s belief structure, teasing out why one believes something, and evidential cruxes traverse outward, teasing out testable consequences  or implications of the belief.

Monodirectional vs. Bidirectional cruxes

Say that you are the owner of a small business. You and your team are considering undertaking a major new project. One of your employees speaks up and says “we can’t do this project. The only way to execute on it would bankrupt the company.”

Presumably, this would be a crux for you. If you knew that the project under consideration would definitely bankrupt the company, you would definitively think that you shouldn’t pursue that project.”

However, it also isn’t a crux, in this sense: if you found out that that claim was incorrect, that actually you could execute on the project without bankrupting your company, you would not, on that basis alone, definitively decide to pursue the project.

This is an example of a monodirectional crux. If the project bankrupts the company, then you definitely won’t do it. But if it doesn’t bankrupt the company then you’re merely uncertain. This consideration dominates all the other considerations, it is sufficient to determine the decision, when it is pointing in one direction, but it doesn’t necessarily dominate when it points in the other direction.

(Oftentimes, double cruxes are composed of two opposite bidirectional cruxes. This can work totally fine. It isn’t necessary that for each participant, the whole question turns on the double crux, so long as for each participant, flipping their view on the crux (from their current view) would also cause them to change their mind about the proposition in question.)

In contrast, we can occasionally identify a bidirectional crux.

For instance, if a person thinks that public policy ought to optimize for Quality Adjusted Life Years, and they’ll support whichever health care scheme does that, then “maximizing QALYs” is a bidirectional crux. That single piece of information (which plan maximizes QALYs), completely determines their choice.

“A single issue voter” is a person voting on the basis of a bidirectional crux.

In all of these cases you’re elevating one of the considerations over and above all of the others.

Pseudo cruxes

[This section is quite esoteric, and is of little practical relevance, except for elucidating a confusion that folks sometimes encounter.]

Because of the nature of mono-directional cruxs, people will sometimes find pseudo-cruxes, propositions that seem like cruxes, but are nevertheless irrelevant to the conversation.

To give a (silly) example, let’s go back to the canonical disagreement about school uniforms. And let’s consider the proposition “school uniforms eat people.”

Take person who is in favor of school uniforms. The proposition that “school uniforms eat people” is almost certainly a crux for them. The vast majority of people who support school uniforms would change their mind if they were convinced that school uniforms were carnivorous.

(Remember, in the context of a Double Crux conversation, you should be checking for cruxy-ness independently of your assessment of how likely the proposition is. The absurdity heuristic is insidious, and many claims that turn out to be correct, seem utterly ridiculous at first pass, lacking a lot of detailed framing and background.)

This is a simple crux. If the uniform preferring person found out that uniforms eat people, they would come to disprefer uniforms.

Additionally, this is probably a crux for folks who oppose school uniforms as well, in one pretty specific sense: were all of their other arguments to fall away, knowing “that school uniforms eat people” would still be sufficient reason for them to oppose school uniforms. Note that doesn’t mean that they do think that school uniforms eat people, nor does it mean that finding out that school uniforms don’t eat people (duh) would cause them to change their mind, and think that school uniforms are good. We might call this an over-determining hypothetical crux. It’s a bidirectional crux that points exclusively in the direction that a person already believes, and which furthermore, the person currently assumes to be false.

A person might say,

I already think that school uniforms are a bad idea, but if I found out they eat people, that would be further reason for me to reject them. Furthermore, now that we’re discussing the possibility, that “school uniforms don’t eat people” is such an important consideration such that it would have to be a component of any conjunctive crux that would cause me to change my mind and think that school uniforms are a good idea. But I don’t actually think that school uniforms eat people, so it isn’t a relevant part of that hypothetical conjunction.

This is a complicated series of claims. Essentially, this person is saying that in a hypothetical world where they thought differently than they currently do, this consideration, if it held up would be a crux for them (that would bring them to the position that they actually hold, in reality).

Occasionally (on the order of once out of 100?), a novice participant will find their way to a pseudo crux like that one, and find themselves confused. They can tell that the proposition “school uniforms eat people” if true, matters for them. It would be relevant for their belief. But it doesn’t actually help them push the disagreement forward, because, at best, it pushes further in the direction of what they already think.

(And secondarily, it isn’t really an opening for helping their partner change their mind, because the uniform-dispreferring person, doesn’t actually think  that school uniforms eat people, and so would only try to argue that they do if they had abandoned any pretense of truth-seeking in favor of trying to convince someone using whatever arguments will persuade, regardless of their validity.)

So this seems like a crux, but it can’t do work in the Double Crux process.

There is another kind of pseudo crux stemming from bidirectional cruxes. This is when a proposition is not a crux, but it’s inverse would be.

In our school uniform example, suppose that that in a conversation, someone boldly, and apropos of nothing,  asserted “but school uniforms don’t eat people.” Uniforms not eating people is a monodirectional crux that dominates all the other considerations, but school uniforms not eating people is so passé, that it is unlikely to be a crux for anyone (unless the reason they were opposed to school uniforms was kids getting eaten). Nevertheless, there is something about it that seems (correctly) cruxy. It is the ineffectual side of a monodirectional crux. It isn’t a crux, but its inverse is. We might call this a crux shadow or something.

Thus, there is a four-fold pattern of monodirectional cruxes, where one quadrant is a useful progress bearing crux, and the other three contain different flavors of pseudo cruxes.

Proposition: “If school uniforms eat people, then I would oppose school uniforms”

Suppose school uniforms eat people Suppose school uniforms don’t eat people
I am opposed to school uniforms Overdetermining hypothetical crux:  “I would oppose school uniforms anyway, but this would be a crux for me, if (hypothetically) I was in favor of school uniforms.” Non-crux / Crux shadow: “Merely not eating people is not sufficient to change my mind. Not a crux.”
I am in favor of school uniforms Relevant (real) monodirectional crux: “If school uniforms actually eat people, that would cause me to change my mind.” Non-crux / Crux shadow: “While finding out that uniforms do eat people would sway me, that they don’t eat people isn’t a crux for me.”

And in the general case,

Proposition: X is sufficient for A, but Not X is not sufficient for B

X is true X is false
I believe A Overdetermining hypothetical crux Non-crux / Crux shadow
I believe B Relevant monodirectional crux Non-crux / Crux shadow

Note that the basic double crux pattern avoids accidentally landing on pseudo cruxes.