Thinking about how to orient to a hostile information environment, when you don’t have the skills or the inclination to become an epistemology nerd

Successfully propagandized people don’t think they’ve been propagandized; if you would expect to feel the same way in either case, you have to distinguish between the two possibilities using something other than your feelings.

Duncan Sabien

I wish my dad understood this point.

But it’s pretty emotionally stressful to live in a world where you can’t trust your info streams and you can’t really have a grasp on what’s going on.

Like, if I tell my dad not to trust the New York times, because it will regularly misinform him, and that “science” as in “trust the science” is a fake buzzword, about as likely to be rooted in actual scientific epistemology as not, he has few reactions. But one of them is “What do you want me to do? Become a rationalist?”

And he has a point. He’s just not going to read covid preprints himself, to piece together what’s going on. That would take hours and hours of time that he doesn’t want to spend, it would be hard and annoying and it isn’t like he would have calibrated Bayesian takes at the end.

(To be clear, I didn’t do that with covid either, but I could do it, at least somewhat, if I needed to, and I did do little pieces of it, which puts me on a firmer footing in knowing which epistemic processes to trust.)

Give that he’s not going to do that, and I don’t really think that he should do that, what should he do?

One answer is “just downgrade your confidence in everything. Have a blanket sense of ‘actually, I don’t really know what’s going on.’ ” A fundamental rationalist skill is not making stuff up, and saying “I don’t know.” I did spend a few hours tying to orient on the Ukraine situation, and forcing myself to get all the way to the point of making some quantiative predictions (so that I have the opportunity to be surprised, and notice that I am surprised). But my fundamental stance is “I don’t understand what’s going on, and I know that I don’t understand. (Also here are some specific things that I don’t know.)”

…Ok. Maybe that is feasible. It’s pretty hard to live in a world where you fundamentally don’t know what’s happening, where people assume you have some tribal opinion about stuff and your answer is “I don’t know, I think my views are basically informed by propaganda, and I’m not skilled enough or invested enough to try to do better, so I’m going to not believe or promote my takes.”

But maybe this becomes easier if the goal of your orientation in the world is less to have a take on what’s going on, but is instead to prioritize uncertainties: to figure out which questions seem most relevant for understanding, so that you have _some_ map to orient from, even if it is mostly just a map of your uncertainty.

Some un-edited writeups of conversations that I’ve had with Ben Hoffman, and Jessica Taylor, and Michael Vassar

Extended Paraphrase of Ben and Jessica’s general view [December 2019]

Eli’s Summary of a Conversation with Vassar [October 2020]

Overview of my conversation with Vassar on Wednesday Feb 10: Trauma patterns [February 2021]

Eli’s notes from Spencer’s interview with Vasser [February 2021]

Parsing Ben and Michael’s key points from that giant twitter conversation [May 2021]

Two interlocking control systems

When I was practicing touch typing I found that much of the skill was a matter of going as fast as I could, without letting my speed outpace my accuracy. If I could feel that the precision of my finger placements was high, I would put more “oomph” into my typing, pushing harder to go faster. 

But I would often fall into an attractor of “rushing” or “going off the rails”, where I was pushing to go fast in a way that caused my accuracy to fall apart, and I started to “trip over myself”. I made a point to notice this starting to happen and then intentionally slow down (and relax my shoulders) to focus on the precision of my finger placements. The goal was never to rush (because that is counter productive), but to go as fast as possible within that constraint.

I think there might be an analogous thing in my personal productivity. 

When I have a largish amount to get done in a short amount of time, this can be energizing and motivating. My physiological arousal is higher. The my personal tempo faster. There’s a kind of energy or motivation that comes from having things that need to get done, with deadlines, and it boots me up into a higher energy orbital, where my default mental actions are geared towards making progress, instead of random “I don’t feel like it” sort of laziness. There’s a bit of a tailwind behind me.

(Indeed, this kind of pressure is exactly what was missing for most of 2020.)

However, sometimes this pressure gets overwhelming, and my intentionality collapses. It’s too much. Either I don’t have the spaciousness to let my attention fully engage with any given task (which is usually necessary for making progress) because of the competing goal threads, instead only managing a shallow superficial attention, or I’ll get overwhelmed and opt out of all of it by distracting myself.

There’s this important principle that I never want my tailwind to outpace my structure. Having some amount of pressure speeding me along is great, but only if my intentionality is high enough to still absorb everything that’s coming at me, taking in the input of what’s important, orienting to it, and taking action on it.

Too much tail wind and that intentionality collapses.

Which means that I need a control system that keeps those two metrics in sync. I need to notice when my intentionality is starting to collapse, and take actions to slow things down and to shore up my intentionality. 

However, my intentionality can collapse for another reason, other than getting outpaced by motivation-pressure. It also collapses when I’m low on energy and alertness.

My intentionality depends on my personal energy and alertness. When my energy and alertness is depleted, the inner structure of my intentionality tends to collapse. 

(There are some caveats here. For one thing, it is possible to maintain intentionality in a low energy state. Also, I can sometimes depend on external structure as a substitute for intentionality, and external structure depends much less on my personal energy and alertness. But to a first approximation, low energy -> low intentionality.)

As a consequence of this, the control system maintaining my intentionality propagates back to an earlier control system maintaining my energy level. I want to notice when my energy is flagging, when I’m just starting to run on fumes, and take action to shore up my energy, before my intentionality collapses.

Furthermore, because my personal energy and alertness is at the bottom of the stack, a lot of my energy and alertness maintenance is not structured as a control system. I employ strategies to get good sleep, and to exercise every day, independently of my current energy level, because high energy is self sustaining.

Having some practices that are “foundational” rather than implemented as control systems is costlier, because it means that I’ll sometimes engage in them when they are not strongly necessary. But foundational systems are more robust: they have more slack in the system to absorb peterbations.

Aversions inhibit slow focus

I’ve written elsewhere about how the biggest factor in my personal productivity is aversions, and skillfully engaging with aversions. It’s maybe not unsurprising that having an aversion to task is relevant to effectively executing on that task. But it is a bit more surprising that having an aversion to some task or consideration, makes it much much less likely that I’ll effectively execute on anything.

The key insight, I think, is engaging deeply in a task entails clearing some mental space.

Aversion to something increases my compulsiveness / distractibility. I’m more likely to take a bathroom break, or to make food for myself, or to rereard old blog posts on my phone (without jotting down my thoughts in the way that makes reading more productive / creative), or to go check twitter and then get stuck in the twitter loop.[1] 

I think this is because I’m feeling some small constant pain, and part of me is compulsively seeking positive stimulation to distract from the pain. Basically holding an aversion makes me more reactive to stray thoughts and affordances of the environment. My immediate actions are driven by a (subtle, but nevertheless dominating) clawing, grasping, drive for positive sensation, instead of flowing from “my deep values”, my sense of what seems cool or alive. 

Most, but not all, forms of creative work, involve making mental space, quieting those distractions so that I can give my full attention to the thing that I’m trying to do. The reason why aversions kill my productivity is that my compulsive stimulation-hunger is too graspy to settle down into any long-threaded thought. That part of me doesn’t want to be still, because it is seeking distraction from the sensation in me.

(The exception is some forms of work that “fit” this compulsiveness, where I can get sucked into compulsively doing some task as a way to distract from the sensation in my body. Sometimes an essay is of the right shape that it can be a hook in just the right way, but most of my work is not like this.)

Generally, when I notice an aversion, I’ll engage with it directly, either by sitting down and meditating, feeling into the sensation in a non semantic way, or by doing focusing / journaling, which is more of a semantic “dialogue”, or something that is a mix of both approaches.

In doing this, I’m first just trying to make space for the sensation, to feel it without distraction, while also being welcoming towards the part of me that is doing the dissociation, and secondly hoping to get more understanding and context, so that I can start planning and taking action regarding the underlying concern of the aversion.

[1] I found myself doing all of these except the last one today, all the while vaguely / liminally aware of the agitation clench in my belly, before I sat down to engage with it directly.

On category I and category II problems

Please note that while I make claims about what I understand other people to believe, I don’t speak for them, and might be mistaken. I don’t represent for MIRI or CFAR, and they might disagree. The opinions expressed are my own.

Someone asked me why I am broadly “pessimistic”. This post, which is an articulation of an important part of my world view, came out of my answer to that question.

Two kinds of problems: 

When I’m thinking about world scale problems, I think it makes sense to break them down into two categories. These categories are nebulous (like all categories in the real world), but I also think that these are natural clusters, and most world scale problems will clearly be in one or the other.

Category I Problems

Let’s say I ask myself “Will factory farming have been eliminated 100 years from now?” (Setting aside x-risk for a moment) I can give a pretty confident “Yes.”

It looks to me that we have a pretty clear understanding of what needs to happen to eliminate factory farming. Basically, we need some form of synthetic meat (and other animal products) that is comparable to real meat in taste and nutrition, and that costs less than real meat to produce. Once you have that, most people will prefer synthetic meat to real meat, demand for animal products will drop off markedly, and the incentive for large scale factory farming will vanish. At that point the problem of factory farming is basically solved.

Beyond and Impossible meat is already pretty good, maybe even good enough to satisfy the typical consumer. That gives us proof of concept. So eliminating factory farming it is almost just a matter of making the cost of synthetic meat < the cost of real meat.

There’s an ecosystem of companies (notably Impossible and Beyond, plus some cultured meat startups) that are iteratively making progress on exactly the problem of driving down the cost. Possibly, it will take longer than I expect, but unless something surprising happens, that ecosystem will succeed eventually.

In my terminology, I would say that there exists a progressmachine that is systematically ratcheting towards a solution to factory farming. There is a system that is making incremental progress towards the goal, month by month, year by year. If things continue as they are, and there aren’t any yet-unseen fundamental blockers, I have strong reason to think that the problem will be fully solved eventually.

Now, this doesn’t mean that the problem is already solved, figuring out ways to make this machine succeed marginally faster is still really high value, because approximately a billion life-years are spent in torturous conditions in factory farms every year. If you can spend your career speeding up the machine by six months, that prevents at least a half a billion life-years of torture (more than that if you shift the rate of progress instead of just translating the curve to the left).

Factory farming is an example of what (for lack of a better term), I will call a “category I” problem. [tentative name “turn-crank problems”]

Category II Problems

In contrast, suppose I ask “Will war have been eliminated 100 years from now?” To this question, my answer has to be “I don’t know, but probably not?”

That’s not because I think that ending war is impossible. I’m pretty confident that there exists some set of knowledge (of institution design? Of game theory? Of diplomacy?) with which we could construct a system that robustly avoided war, forever. 

But in my current epistemic state, I don’t even have a sketch of how to do it. There isn’t a well specified target that if we hit it, we’ll have achieved victory (in the way that “cost of synthetic meat < cost of real meat” is a victory criterion).  And there is no machine that is systematically ratcheting towards progress on eliminating war.

That isn’t to say that there aren’t people working on all kinds of projects which are trying to help with the problem of “war”, or reducing the incidence of war (peace talks, education, better international communication, what have you). There are many people, in some sense, working hard at the problem. 

But their efforts don’t cohere into a mechanism that reliably produces incremental progress towards solving the problem. Or said differently, there are things that people can do that help with the problem on the margin, but those marginal improvements don’t “add up to” a full solution. It isn’t the case that if we do enough peace talks and enough education and enough translation software, war will be permanently solved. 

In a very important sense, humanity does not have traction on the problem of ending war, in a way that it does have traction on the problem of ending factory farming. 

Ending war isn’t impossible in principle, but there are currently no levers that we can pull to make systematic progress towards a solution. We’re stuck doing things that might help a little on the margin, but we don’t know how to set up a system such that if that machine runs long enough, war will be permanently solved. 

I’m going to call problems like this, where there does not exist a machine that is making systematic progress, a “category II” problem. [tentative name “non-turn-crank problems”]

Category I vs. Category II

Some category I problems: 

  • Factory Farming
  • Global Poverty
  • “Getting off the rock” (but possibly only because Elon Musk was born, and took personal responsibility for it)
  • Getting a computer to almost every human on earth
  • Legalizing / normalizing gay marriage 
  • Eradicating Malaria
  • Possibly solving aging?? (and if so, probably only because a few people in the transhumanism crowd took personal responsibility for it)

Some Category II problems

  • War
  • AI alignment
  • Global coordination
  • Civilizational sanity / institutional decision making (on the timescale of the next century)
  • Civilizational collapse
  • Achieving stable, widespread mental health
  • Political polarization
  • Cost disease

If you can draw a graph that shows the problem more-or-less steadily getting better over time, it’s a category I problem. If progress is being made in some sense, but progress happens in stochastic jumps, or it’s non-obvious how much progress was made in a particular period, it’s a category II problem.

In order for factory farming to not be solved, something surprising, from outside of my current model, needs to happen. (Like maybe technological progress stopping, or a religious movement that shifts the direction of society.)

Whereas in order for war to be solved, something surprising needs to happen. Namely, there needs to be some breakthrough, a fundamental change in our understanding of the problem, that gives humanity traction on the problem, and enables the construction of a machine that can make systematic, incremental progress.  

(Occasionally, a problem will be in an inbetween category, where there doesn’t yet exist a machine that is making reliable progress on the problem, but that isn’t because our understanding of the shape of the problem is insufficient. Sometimes the only reason why there isn’t a machine doing this work is only because no person or group, of sufficient competence, has taken heroic responsibility for getting a machine started. 

For instance, I would guess that our civilization is capable of making steady, incremental progress on the effectiveness of cryonics, up to the point cryonics being a reliable, functional, technology. But progress on “better cryonics” is mostly stagnant. I think that the only reason there isn’t a machine incrementally pushing on making cryonics work is that no one (or at least no one of sufficient skill) has taken it upon themselves to solve that problem, thereby jumpstarting a machine that makes steady incremental progress on it. [ 1 ]

It is good to keep in mind that these category 1.5 problems exist, but they mostly don’t bear on the rest of this analysis.)

Maybe the most important thing: Category I problems get solved as a matter of course. Category II problems get solved after we stumble into a breakthrough that turns them into category I problems.

Where in this context, a “breakthrough” is when “some change either in our understanding (the map) or in the world (the territory) that causes the shape of the problem to shift, such that humanity can now make reliable systematic progress towards a solution, unless something surprising happens.”

Properties of Category I vs. Category II problems

Category I problemsCategory II problems
There exists a “machine” that is making systematic, reliable progress on the problemThere isn’t a “machine” making systematic, reliable progress on the problem, and humanity doesn’t yet know how to make such a machine
Marginal improvements can “add up” to a full solution to the problemMarginal improvements don’t “add up” to a full solution to the problem
The problem will be solved, unless something surprising happensThe problem won’t be solved until something surprising happens
We’re mostly not depending on luckWe’re substantially depending on luck
Progress is usually incrementalProgress is usually stochastic
Progress is pretty “concrete”; it is relatively unlikely that some promising project will turn out to be completely uselessProgress is “speculative”; it is a live possibility that any given pieces of work that we consider progress will later prove completely useless in light of better understanding
The bottleneck to solving the problem is the machine going better or fasterThe bottleneck to solving the problem is a breakthrough, defined as “some shift (either in our map or in the territory) that changes the shape of the problem enough that we can make reliable systematic progress on it”
Solving the problem does not require any major conceptual or ontological shifts; progress consists of many, constrained, engineering problemsOur understanding of the problem, or ontology of the problem, will change at least once, but most likely many times, on the path to a full solution
There might be graphs that show the problem getting better, more-or-less steadily, over time.It’s quite hard to assess how much progress is made in a given unit of time, or even if exciting “milestones” actually constitute progress
We know how to train people to fill roles in which they can reliably contribute to progress on the problem, mostly what is needed is effective people to fill those rolesWe have only shaky and tenuous knowledge of how to train people to make progress on the problem; mostly what’s needed is people who can figure out for themselves how to get orientation for themselves
If all the relevant inputs were free, the problem would be solved or very close to solved.

(eg with a perpetual motion machine that produced arbitrarily large amounts of the ingredients to impossible meat, factory farming would be over)
If the inputs were free, this would not solve the problem

(eg with a hypercomputer, AI safety would not be solved)
Properties of Category I vs Category II problems

Luck and Deadlines

In general, I’m optimistic about category I problems, and pessimistic about category II problems (at least in the short term). 

And crucially, humanity doesn’t know how to systematically make progress on intentionally turning a given category II problem into a category I problem. 

We’re not hopeless at it. It is not completely a random walk. Individual geniuses have sometimes applied themselves to a problem that we were so confused about as to not have traction on it, and produced a breakthrough that gives us traction. For instance, Newton giving birth to mathematicized physics. [Note: I’m not sure that this characterization is historically correct.]

But overall, when those breakthroughs occur, it tends to be in large part due to chance. We mostly don’t know how to make breakthroughs, on particular category II problems, on demand. 

Which is to say, any given problem transitioning from II to I depends on luck. 

And unfortunately, it seems like some of the category II problems I listed above 1) are crucial to the survival of civilization, and 2) have deadlines.

It looks like, from my epistemic vantage point, that if we don’t solve some subset of those problems before some unknown deadline (possibly as soon as this century), we die. That’s it. Game over.

Human survival depends on solving some problems for which we currently have no levers. There is nothing that we can push on to make reliable, systematic progress. And there’s no machine making reliable, systematic progress.

Absent a machine that is systematically ratcheting forward progress on the problem, there’s no strong reason to think that it will be solved. 

Or to state the implicit claim more explicitly: 

  1. Large scale problems are solved only when there is a machine incrementally moving towards a solution. 
  2. There are a handful of large scale problems that seem crucial to solve in the medium term. 
  3. There aren’t machines incrementally moving towards solutions to those problems.

So by default, unless something changes, I expect that those problems won’t be solved.

On AI alignment

There are people who dispute that AI risk is a category II problem, and they are accordingly more optimistic. I believe that Rohin Shah and Paul Christiano both think that there’s a pretty good chance that business-as-usual AI development will solve alignment as a matter of course, because alignment problems are on the critical path to making functioning AI. 

That is, they think that there is an existing machine that is solving the problem: the AI/ML field itself.

If I understand them correctly, they both think that there is a decent chance that their EA-focused alignment work will have been counterfactually irrelevant in retrospect, but it still seems like a good altruistic bet to lay groundwork for alignment research now. 

In the terms of my ontology here, they see themselves as shoring up the alignment-progress machine, or helping it along with some differential progress, just in case the machine turns out to have been inadequate to the task of solving the problem before the deadline. Even though they think that there is a substantial chance that their work will, in retrospect, turn out to have been counterfactually irrelevant, because getting the AI alignment problem right seems really high leverage for the value of the future, it is a good altruistic bet to do work that makes it more likely that the machine will succeed, on the margin. 

This is in marked contrast to how I imagine the MIRI leadership is orienting to the problem: When they look at the world, it looks to them like there isn’t a machine ratcheting towards safety at all. Yes, there are some alignment-like problems that will be solved in the course of AI development, but largely by patches that invite nearest-unblocked strategy problems, and which won’t generalize to extremely powerful systems. As such, MIRI is [edit 2023: was] making a desperate effort to make or to be a machine that ratchets toward progress on safety.

I think this question of “is there a machine that is ratcheting the world towards more AI safety”, is one of the main cruxes between the non-MIRI optimists, and the MIRI-pessimists, which is often overshadowed by the related, but less crucial question of “how sharp will takeoff be?”

On “rationality”

Over the past 3 years, I have regularly taught at AIRCS workshops. These are mainly a recruitment vehicle for MIRI, run as a collaboration between MIRI and CFAR.

At AIRCS workshops, one thing that we say early on is that AI safety is a “Preparadigmatic field”, which is more or less the same as saying that AI alignment is a category II problem. AI safety as a field hasn’t matured to the point that there are known levers for making progress.

And we, explain, we’re going to teach some rationality techniques at the workshop, because those are supposed to help one orient to a paradigmatic field. 

Some people are skeptical that these funny rationality methods are helpful at all (which, to be clear, I think is a quite reasonable position). Other people give the opposite critique, “it seems like clear thinking and effective communication and generally making use of all your mind’s abilities, is useful in all fields, not just preparadigmatic ones.”

But this is missing the point slightly. It isn’t so much that these tools are particularly helpful for prepardigmatic fields, it’s that in preparadigmatic fields, this is the best we can provide.

More developed fields have standard methods that are known to be useful. We train aspiring physicists in calculus, because we have ample evidence that calculus is an extremely versatile tool for making progress on physics, for example. [another example would be helpful here]

We don’t have anything like that for AI safety. There are not yet standard tools in the AI safety toolbox that we know to be useful and that everyone should learn. We don’t have that much traction on the problems.

So as a backstop, we teach very general principles of thinking and problem solving, as a sort of “on ramp” to thinking about your own thinking and how you might improve your own process. The hope is that will translate into skill in getting traction on a confusing domain that doesn’t yet have standard methods.

When you’re flailing, and you don’t have any kind of formula for making research progress, it can make sense to go meta and think about how to think about how to solve problems. But if you can just make progress on the object level, you’re almost certainly better off doing that.

People sometimes show up at AIRCS workshops expecting us to give them concrete technical problems that they can try and solve, and are sometimes discouraged that instead we’re teaching these woo-y or speculative psychological techniques. 

But, by and large, we DON’T have concrete, well-specified, technical problems to solve (note that this is a somewhat contentious claim, see the bit about AI safety above). The work that needs to be done is something like “wandering around in one’s confusion in such a way that one can crystalize well specified technical problems.” And how to do that is very personal and idiosyncratic: we don’t have systematized methods for doing that, such that someone can just follow the instructions and get the desired result. But we’ve found that the woo-y tools seem to give people new levers and new perspectives for figuring out how to do this for themselves, so that’s what we have to share.

As a side note: I have a gripe that “rationality” has come to conflate two things, there’s “how to make progress on natural philosophy when you don’t have traction on the problem” and separately, there’s “effective decision-making in the real world”. These two things have some overlap, but they are really pretty different things. And I think that development on both of them has been hurt by lumping them under one label. 


If I were to offer a critique of Effective Altruism it would be this: EA in general doesn’t distinguish between category I and category II problems. 

Of course, this doesn’t apply to every person who is affiliated with EA. But many EAs, and especially EA movement builders, implicitly think of all problems as class I problems. That is, they are implicitly behaving as if there exists a machine that will convert resources (talent, money, attention) into progress on important problems. 

And, as I said, there are many problems for which there does exist a machine doing that. But in cases where there isn’t such a machine, because the crucial breakthrough that would turn the problem from category II to category I hasn’t occurred yet, this is counterproductive. 

The sort of inputs that allow a category I problem-solving machine to go better or faster, are just very different from the sort of inputs that make it more likely that humanity will get traction on a category II problem. 

Ease of Absorbing Talent

For one thing, more people is often helpful for solving a category I problem, but is usually not helpful for getting traction on a category II problem. Machines solving category I problems can typically absorb people, because (by dint of having traction), they are able to train people to fill useful roles in the machine. 

Folks trying to get traction on a category II problem, by definition, don’t have systematic methods by which they can make progress. So they can’t easily train people to do known-to-be-useful work. 

I think there are clusters that are able to make non-0 progress on getting traction, and that those clusters can sometimes usefully absorb people, but they basically need to be people that have a non-trivial ability to get traction themselves. Because the work to be done is trying to get traction on the problem, it doesn’t help much to have more people who are waiting to be told what to do: the thing that they need to do is figure out what to do. [ 2 ]

Benefit of Marginal Improvements

For another thing, because machines solving category I problems can generally absorb resources in a useful way, and because they are making incremental progress, it can be useful to nudge people’s actions in the direction of a good thing, without them shifting all the way to doing the “optimal” thing. 

  • Maybe someone won’t go vegan, but they might go vegetarian. Or maybe a company won’t go vegetarian, but it can be persuaded to use humanely farmed eggs.
  • Maybe this person won’t change their whole career-choice, but they would be open to choosing more impact oriented projects within their career.
  • Maybe most people won’t become hard-core EAs, but if many people change their donations to be more effective on the margin, that seems like a win.
  • Maybe an AI development company won’t hold back the deployment of their AI system for years, and find a way to insure that it is aligned, but it can be convinced to hire a safety team.

For category I problems interventions on the margin “add up to” significant progress on the problem. What a category I problem means is that there are at least some forms of marginal improvement that, in aggregate, solve the problem.

But in the domain of category II problems, marginal nudges like this are close to useless. 

Because there is not a machine, to which people can contribute, that will incrementally make progress on the problem, getting people to be somewhat more aware of the problem, or care a little about the problem, doesn’t do anything meaningful.

In the domain of a category II problem, the thing that is needed is a breakthrough (or a series of breakthroughs) that will turn it into a category I problem. 

I don’t know how to make this happen in full generality, but it looks a lot closer to a small number of highly talented, highly-invested people who are working obsessively on the problem than it looks like a large mass of people who are aware that the problem is important and will make marginal changes to their lives to help. 

A machine learning researcher who is not interested in really engaging with the hard part of the problem of AI safety, because that would require backchaining from bizarre-seeming science-fiction scenarios, but is working on a career-friendly paper that he has rationalized, by way of some hand-wavy logic as, “maybe being relevant to AI safety someday”, is, it seems to me, quite unlikely to uncover some crucial insight that leads to a breakthrough on the problem. 

Even a researcher who is sincerely trying to help with AI safety, whose impact model is “I will get a PhD, and then publish papers about AI safety” is, according to me, extremely unlikely to produce a breakthrough. Such a person is acting as if there is a role that they can fill, and if they execute well in that role, this will make progress on the problem. They’re acting as if they are part of an existing solution machine.

But, as discussed, that’s not the way it works: we don’t know how to make progress on AI safety, there aren’t straightforward things to do that will reliably make progress. The thing that is needed is people who will take responsibility for independently attempting to figure out how to make progress (which, incidentally, involves tackling the whole problem in its entirety).

If a person is not thinking at all about how to get traction on the core problem, but is just doing “the sort of activities that an AI safety researcher” does, I think it is very unlikely that their activity is going to lead to the kind of groundbreaking work that changes our understanding of the problem enough that AI alignment flips from being in category II to being in category I.

In these category II cases, shifts in behavior, on the margin, do not add up to progress on the problem. 

EA’s causal history

Not making a distinction between category I and category II problems, EA as a culture, has a tendency to try and solve all problems as if they are category I problems, namely by recruiting more people and directing them at AI alignment or AI policy or whatever.

This isn’t that surprising to me, because I think a large part of the memetic energy of EA came from the identification of a specific category I problem: There was a claim that a first world person could easily save lives by donating to effective charities focused on health interventions in the third world. 

Insofar as this is true, there’s a strong reason to recruit as many people to be engaged with EA as possible: the more people involved, the more money moved, the more lives saved.

However, in the years since EA was founded, the intellectual core of the movement updated in two ways: 

Firstly, it now seems more dubious (to me at least) that lives can be cheaply and easily saved that way. (I’m much less confident of this point, and it isn’t a crux for me, so I’ve removed discussion of it to an endnote.[ 3 ] )

And more importantly, EA realized that there are vastly more important problems than global poverty. 

X-risk has been a thread in EA discourse from the beginning: one of the three main intellectual origins of EA was LessWrong (the other two being GiveWell coming from the finance world, and Giving What We Can / 80,000 hours stemming from some Oxford philosophers). But sometime around 2014 the leadership of EA settled on long-termism and x-risk as the “most important thing”. (I was part of the volunteer team for EAG 2015, and I saw firsthand that there was an explicit push to prioritize x-risk.)

Over recent years that shift has taken form: deprioritizing earning to give, but promoting careers in AI policy, for instance. 

I claim that this pivot represents a more fundamental shift than most in EA realize. Namely, a shift from EA being the sort of thing that is attempting to make progress on a category I problem to EA being the sort of thing attempting to make progress on a category II problem. 

EA developed as a movement for making progress on a category I problem: It had as a premise that ordinary people can do a lot of good by moving money to (mostly) pre-existing charities, and by choosing high impact “careers” (where a “career” implies an already-mapped out trajectory). Category I orientation is implicit in EA’s DNA.

And so when the movement tries to make the pivot to focusing on x-risk, it implicitly orients to that problem as if it were a category I problem. “Where can we direct money and talent, to make impact on x-risk”.

For all of the reasons above, I think this is a fundamental error: the inputs that lead to progress on a category I problem are categorically different than those that lead to progress on a category II problem. 

To state my view starkly, if somewhat uncharitably, EA is essentially, shoveling resources into a non-existent x-risk progress machine, imagining that if they shovel enough, this will lead to progress on x-risk, and the other core problems of the world. As I have said, I think that there isn’t yet a machine that can make consistent incremental progress in that way.

But it would be pretty hard, I think, for EA to do something different. This isn’t a trivial error to correct: changing this would entail changing the fundamental nature of what EA is and how it orients to the world.


[ 1 ] – I’ve heard Nate Sores refer to “the curse of Cryonics” which is that anyone who has enough clear thinking independent thought to realize that cryonics is important, can also see that there are vastly more important problems.

[ 2 ] –  I think that this is a little bit of an oversimplification. I think there are ways that people can contribute usefully in a mode that is close to “executing on what some people they trust think is a good idea”, but you do need a critical mass of people who are clawing for traction themselves, or this doesn’t work. Therefore the regent is people clawing for traction, and capacity to absorb conscientious ability-to-execute talent is limited.

[ 3 ] – The world is really complicated, and I’m not sure how confident to be that our charitable interventions are working. This post by Ben Hoffman pointing out that the expected value distribution for deworming interventions trends into the negative, but most EAs don’t seem aware of this, seems on point. 

Further (though this is my personal opinion, more than any kind of consensus), the argument Eliezer makes here is extremely important for anyone taking aim at eradicating poverty. If there is some kind of control system that keeps people poor, regardless of the productivity of society, this suggests that there might be some fundamental blocker to ending poverty: until that control system is addressed somehow, all of the value ostensibly created by global health interventions is being sucked away. 

(Admittedly, this argument holds somewhat less force if one is aiming simply to reduce human suffering in the next year, rather than any kind of long term goal like “permanently and robustly end poverty.”

Considerations like that one suggest that we should be much less certain that our favored global poverty interventions are effective. Instead of (or perhaps, in addition to) rushing to move as many resources to those interventions as possible, it seems like the more important thing is to continue trying to puzzle out, via experiments and modeling and whatever tools we have, how the relevant social systems work, to verify that we’re actually having the positive effects that we’re aiming for. It seems to me that even in the domains of global poverty, we still need a good deal of exploration relative to exploitation of the interventions we’ve uncovered.

Relatedly it seems to me that focusing on charity is somewhat myopic: it is true that there is a machine eradicating poverty, but that machine is called capitalism, not charity donation. Maybe the charity donations help, but I would guess that if you want to really have the most impact here, the actual thing to do is not give to charities but something more sophisticated that engages more with that machine. (I might be wrong about that. Maybe in fact global health interventions are, actually the best way to unblock the economic engine so that capitalism can pull the third world out of poverty faster).

My current high-level strategic picture of the world

Follow up to: My strategic picture of the work that needs to be done, A view of the main kinds of problems facing us

This post outlines my current epistemic state regarding the most crucial problems facing humanity and the leverage points that we could, at least in principle, intervene on to solve them. 

This is based on my current off-the-cuff impressions, as opposed to careful research. Some of the things that I say here are probably importantly wrong (and if you know that to be the case, please let me know). 

My next step here, is to more carefully research the different “legs” of this strategic outline, shoring up my understanding of each, and clarifying my sense of how tractable each one is as an intervention point.

None of this constitutes a plan. More like, this is a first sketch, to facilitate more detailed elaboration.

The Goal

My overall goal here is to explore the possible ways by which humanity achieves existential victory. By existential victory, I mean, 

The human race[1] survives the acute risk period, and enters a stable period in which it (or our descendants) are able to safely reflect on what a good universe entails, and then act to make the reachable universe good.

This entails humanity surviving all existential risk and getting to a state where existential risk is minimized (for instance, because we are now protected from most disasters by an aligned superintelligence or by a coalition of aligned super intelligences).

Possibly, there is an additional constraint that the human race not just survive, but remain “healthy”along some key dimensions, such as control over our world, intellectual vigor, freedom from oppressive power-structures, trauma, if detriments along those dimensions are irreparable and would therefore permanently limit our ability to reflect on what is Good.

This document describes the two basic trajectories that I can currently see, by which we might systematically achieve that goal (as opposed to succeeding by luck).

The Two Problems

In order to get to that kind of safe attractor state there appear to be two fundamental classes of problems facing humanity: technical AI alignment, and civilizational sanity.

By “technical AI alignment”, I mean the problem of discovering how to build and deploy super-humanly powerful AI systems (embodied either in a singleton, or an “ecosystem” of AIs), safely, in a way that doesn’t extinct humanity, and broadly leaves humans in control of the trajectory of the universe.

By “civilizational sanity”, I mean to point at the catch-all category of whatever causes high leverage decision makers to make wise, scope-sensitive, non-self-destructive, choices.

Civilizational Sanity includes whatever factors cause your society to do things like “saving ~500,000 lives by running human challenge trials on all existing COVID vaccines in February 2020, scaling up vaccine production in parallel with market mechanisms, and then administering vaccinations, en masse, to everyone who wants them, with minimal delay”, or something at least that effective, instead of what the actually did.

It also includes whatever it takes for a government to successfully identify and successfully carry through good macroeconomic policy (which I’ve heard is NGPD targeting, though I don’t personally know).

And it includes whatever factors cause it to be the case that your civilization suddenly acquiring god-like powers (via transformative AI or some other method), results in increased eudaimonia instead of in some kind of disaster.

I think that the only shot we have of exiting the critical risk period by something other than luck is sufficient success at solving AI alignment sufficient success at solving civilizational sanity, and implementing our solution.

(The “Strategic Background” section of this post from MIRI outlines a similar perspective of the high level problem as I outline in this document. However it elaborates, in more detail, a path by which AI alignment would allow humanity to exit the acute risk period (minimally aligned AI -> AGI powered technological development -> risk mitigating technology -> pivotal act that stabilizes the world), and de-emphasizes broad-based civilizational sanity improvements as another path out of the acute risk period.)


To some degree, solutions to either technical alignment or civilizational sanity can substitute for each other, insofar as a full solution to one of these problems would approximately obviate the need for solving the other.

For instance, if we had a full and complete understanding of AI alignment, including rigorous proofs and safe demonstrations of alignment failures, fully-worked-out safe engineering approaches, and crisp theory tying it all together, we would be able to exit the critical risk period. 

Even if it wasn’t practical for a small team to code up an aligned AI and foom, with that level of detail, it would be easy to convince the existing AI community (or perhaps just the best equipped team) to build aligned AI, because one could make the case very strongly for the danger of conventional approaches, and provide a crisply-defined alternative.

On the flip side, at some sufficiently high level of global civilizational sanity, key actors would recognize the huge cost to unaligned AI, and successfully coordinate to prevent anyone from building unaligned AI until alignment theory has been worked out.

We can make partial progress on either of these problems. The task facing humanity as a whole is to make sufficient progress on one, the other, or both, of these problems in order to exit the acute risk period. Speaking allegorically, we need the total progress on both to “sum to 1.” [2]

A note on “sufficiency”

Above, I write “I think that the only shot we have of exiting the critical risk period by something other than luck is sufficient success at solving AI alignment or sufficient success at solving civilizational sanity…”.

I want to clearly highlight that the word “sufficient” is doing a lot of work in that sentence. “Sufficient” progress on AI alignment or “sufficient” progress on civilizational sanity is not yet operationalized enough to be a target. I don’t know what constitutes “enough” progress on either one of these, and I don’t know if I could recognize it if I saw it. 

Civilizational Sanity, in particular, is always a two place function: I can only judge a civilization to be insane relative to my own epistemic process. If societal decision making improves, but my own process improves even faster, the world will still seem mad to me, from my new more privileged vantage point. So in that sense, the goal posts should be constantly moving. 

My key claim is only that there is some frontier defined by these axes such that, if the world moves past that frontier, we will be out of the acute risk period, even though I don’t know where that frontier lies.  

A note on timelines

When I talk about civilizational sanity interventions as a line of attack on AI risk, folks often express skepticism that we have enough time: AI timelines are short, so short that it seems unlikely that plans that attempt to reform the decision making process of the whole world will bear fruit before the 0 hour. [3]

I think that this is wrong-headed. It might very well be the case that we don’t have time for any sufficiently good general sanity boosting plans to reach fruition. But it might just as well be the case that we don’t have time for our technical AI alignment research to progress enough to be practically useful.

Our basic situation (I’m claiming), is that we either need to get to correct alignment theory, or to a generally sane civilization before the transformative AI countdown reaches 0. But we don’t know how long either of those projects will take. Reforming the decision processes of the powerful places in the world might take a century or more, but so might solving technical alignment.

Absent more detailed models about both approaches, I don’t think we can assume that one is more tractable, more reliable, or faster, than the other.

AI alignment in particular?

This breakdown is focused on the AI alignment problem in particular (it’s taking up half of the problem space), giving the impression that AI risk, is the only, or perhaps the most dangerous, existential risk.

While AI risk does seem to me to pose the plurality of the risk to humanity, that isn’t the main reason for breaking things down in this way. 

Rather it’s more that every intervention that I can see that has a shot of moving us out of the acute risk period goes through either powerful AI, or much saner civilization, or both. [I would be excited to have counterexamples, if you can think of any.]

We need protection against bio-risk, nuclear war, and civilizational collapse / decline. But robust protection against any one of those doesn’t protect us from the others by default. Aligned AI and a robustly sane civilization are both general enough that a sufficiently good version of either one would eliminate or mitigate the other risks. Any other solution-areas that have that property, and don’t flow through aligned AI or a general sane civilization would deserve their own treatment in this strategic map, but as of yet, I can’t think of any.

Technical AI alignment

I don’t have much to say about the details of this project. In broad strokes, we’re hoping to get a correct enough philosophical understanding of the concepts relevant to AI alignment, formalize that understanding as math, and eventually develop those formalizations into practical engineering approaches. (Elaboration on this trajectory here.)

(There are some folks who are going straight for developing engineering frameworks [links], hoping that they’ll either work, or give us a more concrete, and more nuanced understanding of the problems that need to be solved.)

It seems quite important if there are better or faster ways to make progress here. But my current sense of things is that it is just a matter of people doing the research work + recruiting more people who can do the research work. See my diagram here

Civilizational Sanity

Follow up to: What are some Civilizational Sanity Interventions

This second category is much less straightforward. 

Within the broad problem space of “causing high-level human decision making to be systematically sane”, I can see a number of specific lines of attack, but I have wide error bars on how tractable each one is.

Those lines of attack are

  1. Unblocking governance innovation
  2. Powerful intelligence enhancement
  3. Reliable, scalable, highly effective resolution of psychological trama
  4. Chinese ascendency

I’m sure this list isn’t exhaustive. These four are the only interventions that I currently know of that seem like (from my current epistemic state) they could transform society enough that we could, for instance, handle AI risk gracefully. 

Relationship between these legs

In particular, there’s an important open question of how these approaches relate to each other, and the broader civilizational sanity project. 

I described above that I think that “AI alignment” and “civilizational sanity” have an “or” or a “sum” relationship: sufficient progress on only one of them can allow us to exit the critical risk period.

There might be a similar relationship between the following civilizational sanity interventions: pushing on any one of them, far enough, leads to a large jump in civilizational sanity, kicking off a positive feedback loop. OR it might instead be that only some of these approaches attack the fundamental problem, and without success on that one front, we won’t see large effects from the others.

Unblocking Innovation in Governance

Better Governance is Possible

The most obvious way to improve the sanity of high-leverage decisions on planet earth is governmental reform.

Our governmental decision making processes are a mess. National politics is tribal politics writ-large: instead of a societal-level epistemology trying to select the best policies, we have a bludgeoning match over which coalitions are best, and which people should be in charge. Politicians are selected on the basis of electability, not expertise, or even alignment with society, yet somehow we seem to be ending up with candidates that no one is enthusiastic about. Congress is famously in a semi-constant self-strangle-hold, unable to get anything done. And the constraints of politics forces those politicians to say absurd things in contradiction with, for instance, basic economic theory, and to grandstand about things that don’t matter and (even worse) things that do.

The current system has all kinds of analytical demonstrable game theoretic drawbacks that make undesirable outcomes all but inevitable: including a two-party system that no one likes much, principal agent problems between the populous and the government, and net societal losses as a result of allocation of benefits to special interest groups.

There hasn’t been a major innovation in high-level governance, since the invention and wide-scale deployment of democracy in the 18th century. It seems like we can do better. We could, in principle, have governmental institutions that are effective epistemologies, are able to identify problems and determine and act on policies at the frontier of society’s various tradeoffs instead at the frontier of the tradeoffs of political expediency.

And because governments have so much influence, more effective information processing in that sector could lead to better institution designs in all other sectors. Public policy is in part a matter of creating and regulating other institutions. Saner government decision making entails setting up efficient and socially-beneficial incentives for health, education, etc, which selects for effective institutions in those more specific sectors. In this way, government is a meta-institution that shapes other institutions. (It’s unclear to me to what degree this is true. How much does better policy at the governmental level, automatically correct the inefficiencies of, say, the medical bureaucracy?)

One might therefore think that a particularly high leverage intervention is to develop new systems of governance. But humanity has a pretty large backlog of governance innovations that seem much better than our current setups on a number of dimensions, from the simple, like using Single Transferable Vote instead of First Past the Post, to the radical, like Futarchy, or the abolition of private property in favor of a COST system.

It seems to me that the bottleneck for better governmental systems is not possible alternatives, but rather the opportunity to experiment with those alternatives. Apparently, there are approximately no venues available for governmental innovation on planet earth.

This is not very surprising, because incumbents in power, benefit from the existing power structure and therefore oppose replacing it with a different mechanism. In general, everyone who has the ability to gatekeep experiments with new governance mechanisms is incentivized to be threatened by those experiments

However, widespread experimentation and innovation in governance would likely be a huge deal, because it would allow humanity as a whole to identify the most successful mechanisms, which, having been shown to work, could be tried at larger scales, and eventually widely adopted.

Experimentation Leads to Eventual Wide Adoption

The basic argument that merely allowing experimentation will eventually lead to better governance on a global scale is as follows: 

Many governance mechanisms, if tried, will not only 1) surpass existing systems, but 2) will surpass existing systems in a legible way, both in aggregate outcomes (like economic productivity, employment, and tax-rate), and from direct engagement with those systems (for instance, once voters become familiar with Futarchy, it might seem absurd that you would elect individuals who are both supposed to represent one’s values and have good plans for achieving those values). 

If the condition of “legible superiority” holds, there would be pressure to replicate those mechanisms elsewhere, at all different scales. Eventually, the best innovations simply become the new standard practices.

Similarly, for many incentive-aligning interventions, not using such methods is a stable attractor: it is in the interests of those in power to resist their adoption. But also, wide-spread use of such methods is also a stable attractor. Once common, it is in the interests of those in power to keep using them. As Robin Hanson says of prediction markets:

I’d say if you look at the example of cost accounting, you can imagine a world where nobody does cost accounting. You say of your organization, “Let’s do cost accounting here.”

That’s a problem because you’d be heard as saying, “Somebody around here is stealing and we need to find out who.” So that might be discouraged.

In a world where everybody else does cost accounting, you say, “Let’s not do cost accounting here.” That will be heard as saying, “Could we steal and just not talk about it?” which will also seem negative.

Similarly, with prediction markets, you could imagine a world like ours where nobody does them, and then your proposing to do it will send a bad signal. You’re basically saying, “People are bullshitting around here. We need to find out who and get to the truth.”

But in a world where everybody was doing it, it would be similarly hard not to do it. If every project with a deadline had a betting market and you say, “Let’s not have a betting market on our project deadline,” you’d be basically saying, “We’re not going to make the deadline, folks. Can we just set that aside and not even talk about it?”

This may generalize to many institution designs that are better than the status quo.

For these reasons, finding ways around the general moratorium of governmental innovation, so that new governance mechanisms can be tried, has possibly huge dividends.

Strategies to allow for Experimentation

Currently, the only approaches I’m aware of for creating spaces for governmental innovation are charter cities and sea steading.

Charter cities are bottle-necked on legal restrictions, and the practical coordination problem of getting a critical mass of residents. But I’m hopeful that COVID has caused a permanent shift to remote work, which will give people more freedom in where to live, and increase competition-in-governance between cities and states, who want to attract talent.

Seasteading is currently bottlenecked on the engineering problem of creating livable floating structures, cheaply enough to be scalable. [Double check if cost is actually the key concern.]

Repeatable reform templates

I wonder if there might be a third, more abstract, line of attack on unblocking governance innovation: developing a repeatable method to change existing governmental structures in a way that incentivizes powerful incumbents.

If it were possible to simply buy out incumbents and overhaul the system, that might be a huge opportunity. However, I guess that in most liberal democracies, this is both illegal and generally repugnant (plus politicians are beholden to their party which might object), such that existing power-holders would not accept a straightforward “money for institutional reform” trade.

But there may be some other version which, in practice, incentivizes power-holders to initiate governmental reform. Possibly by letting those power-holders keep their power for some length of time, and also recieve the credit for the change. Or maybe a setup that targets those people before they take power, when they are more idealistic, and more inclined to make an agreement to cause reform, conditional on all their peers doing the same, in the style of a free-state agreement.

If we could find a repeatable “template” for making such deals, it might unlock the ability to iteratively improve existing governmental structures.

I’m not aware of any academic research in this area (both historical case studies of how these kinds of shifts have occurred in the past and analytic models of how to incentivize such changes seem quite useful to me), nor any practical projects aiming for something like this.

Intelligence enhancement

One might posit that the sort of incentive problems that lead to bizarre institutional policies is the inevitable result of the fact that doing better requires understanding many abstract, non-intuitive concepts and/or careful reasoning in complicated domains, and the average person is of average intelligence, which is insufficient to systematically identify better policies and institutional set-ups over worse ones at the current margin.

In this view, the fundamental problem is that our civilizational decision making processes are much worse than is theoretically possible, because we are collectively not smart enough to do better. Some of us can identify the best policies (or at least determine that one policy is better than another), some of the time, but that relies on understanding that is esoteric to many more people, including many crucial decision makers.

But if the average intelligence of the population as a whole was higher, more good ideas would seem obviously good to more people, and it would be substantially easier to get critical mass of acceptance of sane policies on the object level, as well as better information processing mechanisms. (For instance, If the IQ curve was shifted 35 points to the right, many more people would be able to “see at a glance” why prediction markets are an elegant way of aggregating information.)

More intelligence -> More understanding of important principles -> Saner policies

So it might be that the most effective lever on civilizational sanity is intervening on biological intelligence.

The most plausible way to do this is via widespread genetic enhancement, with either selection methods like iterated embryo selection, or direct gene editing using methods like CRISPR.

My current understanding is that these methods are bottle-necked on our current knowledge of the genetic predictors of intelligence: if we knew those more completely, we would basically be able to start human genetic engineering for intelligence. It seems like that knowledge is going to continue to trickle in as we get better at doing genomic analysis and collect larger and larger data sets. [Note: this is just my background belief. Double-check] Possibly, better Machine Learning methods will lead to a sudden jump in the rate of progress on this project?

On the face of it this suggests that any project that could provide a breakthrough in decoding the genetic predictors of intelligence could be high leverage.

Aside from that, there’s some risk that society will fork down a path in which human genetic enhancement is considered unethical, and will be banned. I’m not that worried about this possibility, because as long as some people / groups are doing this for their children there is a competitive pressure to do the same, and I think it is pretty unlikely that China, which is competitive, at the national level, with the rest of the world, and in which families already regularly exert huge efforts to give their children competitive advantages relative to societies at large, will forgo this opportunity. And if China invests in human genetic enhancement, the US will do the same out of a fear of Chinese dominance.

Some other avenues for human intelligence enhancement include nootropics, which seems much less promising for the basic algernonic argument, and brain computer interfaces like Neuralink. Of the latter, it is currently unknown which dimensions of human cognition can be readily improved, and if such augmentation will lend itself to wisdom or whatever the precursors to civilizational sanity are.

There’s also the possibility of using sufficiently aligned AI assistants to augment our effective intelligence and decision making. Absent our alignment research giving us very clear criteria for aligned systems, this seems like a very tricky proposition, because of the problems described in this post. But in worlds where AI technology continues to improve along its current trajectory, it might be that using limited AI systems as leverage for improving our decision making and research apparatus, to further improve our alignment technologies, is the best way to go.

A note on improving public understanding by methods other than intelligence enhancement:

Possibly there are other ways to substantially increase each person’s individual intellectual reach, so that we can all come to understand more, without increasing biological intelligence. Things in the vein of “better education”. 

I’m pretty dubious of these. 

I think I have far above average skill in communicating (both teaching and being taught) complex or abstract ideas. But even being pretty skilled, for a human, it is just hard. Even when the conditions are exceptional (a motivated student working one-on-one with a skilled tutor who understands the material and can model / pace to the student’s epistemic state), it just takes many focused hours to grasp many important concepts.

I think that any educational intervention effective enough to actually move the needle on civilizational sanity would have to be very radical: so transformative that it would be a general boost in a person’s learning ability, i.e. an increase in effective intelligence. That said, if anyone has ideas for interventions that could increase most people’s intellectual grasp, I would love to hear them.

(…Possibly a dedicated and well executed campaign to educate the public at large ins some small set of extremely important concepts, with the goal of shifting what sorts of explanations sounds plausible to most people (raising the standard for what kinds of economic claims people can make in public with a straight face, for instance), would be helpful on the margin. But this seems to me like an enormous undertaking which would require pedagogical and mass-communication knowledge that I don’t know if anyone has. And I’m not sure how helpful it would be. Even if the whole world understood econ 101, the real world is more complicated than econ 101, such that I don’t know how much that alone would aid people’s assessment of which policies are best. I suppose it would cut out some first-order class of mistakes.)

I do think there are definitely ways to increase our collective intellectual reach, so that societies can systematically land on correct conclusions without increasing any individual person’s intellectual reach or understanding. These include the governance mechanisms I alluded to in the last section. 

There might also exist society wide “public services”, that could do something like this while side-stepping government bureaucracy entirely, like the dream of arbital. I’m not sure how optimistic I should be about those kinds of interventions. The only comparable historical examples that I can think of are wikipedia and public libraries. Both of these seem like clearly beneficial public goods with huge flow-through effects, which make information easily available to people who want it and wouldn’t otherwise have access. But neither one seems to have obviously improved high-level civilizational decision making relative to the counterfactual.

Clearing “Trauma”??

[The following section is much more speculative, and I don’t yet know what to think of it.]

There’s another story in which the main source of our world’s dysfunction is self-perpetuating trauma patterns. 

There are many variations of this story, which differ in important details. I’ll outline one version here, noting that something like this could be true without this particular story being true.

According to this view…

virtually everyone is traumatized (or if you prefer, “socialized”), into dysfunctional and/or exploitative behavior patterns, to greater or lesser degrees, in the course of growing up. 

The central problem isn’t (just) that everyone is following their local self-interest in globally destructive systems, it is actually much worse than that: people are conditioned in such a way that they are not even acting in their narrow self interest. Instead humans myopically focus on goals, and execute strategies, that are both 1) globally harmful and 2) not even aligned with their own “true” reflective, preference, due to false assumptions underlying their engagement with the world. This myopia also inhibits their ability to think clearly about parts of the world that are related to their trauma

(As a case in point, I think it is probably the case that there are lots of people aggressively pursuing AGI, and who are instinctively flinch away from any thought that AGI might be dangerous, because they have a deep, unarticulated, belief that if they can be successful at that, their parents will love them, or they won’t feel lonely any more, or something like that.)

They’ve been conditioned to feel threatened, or triggered by, a huge class of behaviors that are globally productive, like accurate tracking of harms, and many kinds of positive-sum arrangements.

Furthermore, the core reason why most people can’t seem to think or to have “beliefs in the sense of anticipations about the world” is not (mostly) a matter of intelligence, but rather that their default reasoning and sense-making functions have been damaged by the institutions and social contexts in which they participate (school, for instance).

Those traumatizing contexts  are not designed by conscious malice, but they are also not necessarily incidental. It’s possible that they have been optimized to be traumatizing, via unconscious social hill-climbing.

This is because trauma-patterns are replicators: they have enough expressive power to recreate themselves in other humans, and are therefore subject to a selection pressure that gradually promotes the variations that are most effective at propagating themselves. (Furthermore, there’s a hypothesis that for a traumatized mind, one of the best ways to control the environment to make it safe is to similarly traumatize people in the environment.) The net result is horrendous systems of hurt people hurting people, as a way to pass on that particular flavor of hurt to future generations.

Part of the hypothesis here is that these trauma patterns have always been a thing in human societies, but there has also typically been a counter-force, namely that if you need to work together and have a good understanding of the physical world to survive in a harsh environment, your epistemology can’t be damaged too badly, or you’ll die. But in the modern world, we’ve become so wealthy, and most people have become so divorced from actual production, that that counter-force is much diminished.

Implications for Improving the World

If this story is true, governmental reform is likely to fail for seemingly-mysterious reasons, because there is selection pressure optimizing against good institutional epistemology, over and above bureaucratic inertia and the incentives of entrenched power-holders. If you don’t defuse the underlying trauma-patterns, any system that you try to build will either fail or be subverted by those trauma-patterns..

And under this story, it’s unclear how much intelligence enhancement would help. All else being equal, it seems (?) that being smarter helps in developmental work, and healing from one’s personal traumas, but it might also be the case that greater intelligence enables more efficient propagation of trauma patterns. 

If this story is largely correct, it implies that the actual bottleneck for the world is understanding trauma and trauma resolution methods well enough to heal trauma-patterns at scale. If we can do that, the agency and intellect of the world (which is currently mostly suppressed), will be unblocked, and most of the other problems of the world will approximately solve themselves.

I also don’t know to what extent there already exist methods for reliably and rapidly resolving trauma patterns, and the degree to which the bottleneck is actually one 1 to n scaling rather than 0 to 1 discovery. Certainly there are various methods that at least some people have gotten at least some benefit from, though it remains unclear how much of the total potential benefit even the best methods provide to the people who have gotten the most from them.

I don’t know what to think of all of this yet, the degree to which trauma is at the root of the world’s ills, the degree to which things have actually been optimized to be traumatizing as opposed to ending up that way by accident, or even if “trauma” is a meaningful category pointing at a real phenomenon that is different from “learning” in a principled way.

I’ll note that even if the strong version of this story is not correct, it might still be the case that many people’s intellectual capability is handicaped by psychological baggage. So it might be the case that research into effective trauma-resolution methods may be an effective line of attack on improving the world’s intellectual capability. For instance, finding a non-scalable method for reliably resolving trauma might be an important win, because at minimum, we could apply it to all of the AI safety researchers. This might be one of the possible gains on the table for speeding progress on the alignment problem. 

(Though this is also something to be careful of, since such methods would likely have some kind of psychological side effects, and we don’t necessarily want to reshape the psyches of earth’s contingent of alignment researchers all in the same way. I worry that we might have already done this to some degree with circling: Circling seems quite good and quite helpful, but I think that we should be concerned that if we make a mistake about what directions are good to push the culture of our small AI safety community, we’re likely to destroy a lot of value.)

The Rise of China??

In the first section describing what I meant by civilizational sanity up above, I noted “sensible response to COVID” as one indicator of civilizational sanity. Notably, China’s covid response, seems, overall, to have been much more effective than the West’s.

This doesn’t seem like an aberration, either. As a non-expert foreigner, looking in, it looks like China’s society/government is overall more like an agent than the US government. It seems possible to imagine the PRC having a coherent “stance” on AI risk. If Xi Jinping came to the conclusion that AGI was an existential risk, I imagine that that could actually be propagated through the chinese government, and the chinese society, in a way that has a pretty good chance of leading to strong constraints on AGI development (like the nationalization, or at least the auditing of any AGI projects).

Whereas if Joe Biden, or Donald Trump, or anyone else who is anything close to a “leader of the US government”, got it into their head that AI risk was a problem…the issue would immediately be politicized, with everyone in the media taking sides on one of two lowest-common denominator narratives each straw-manning the other. One side would attempt to produce (probably senseless) legislation in the frame of preventing the bad guys from doing bad things, while the other side goes to absurd lengths to block them as a matter of principle, and in the end we’re left with some regulation on tech companies that doesn’t cleave to the actual shape of the problems at all, and pisses off researchers who are frustrated that this anthropomorphizing, “AI risk” hubbub, just made their lives much harder, alienating them.

(One might think that this is actually a national security issue, and it would be taken more seriously than that, but COVID was a huge public health issue, and we managed to politicize wearing masks.

So, maybe it would be good for the world if China was the dominant world power?

I think that overall, China’s society, and high level decision making is currently saner than that of the western world. So maybe on the margin, the world is better off if China were more dominant. 

However, I have a number of reservations.

  1. China’s human rights record is not great. Apparently, there is an ongoing genocide of the Uighurs, happening right now. My deontology is pretty reluctant to put mass murders in charge of the world.
    1. I’m not sure how to think about this. Genocide is extremely bad. And furthermore we have a strong, coordinated norm to censor and take action against it (although, obviously not that strong, since I don’t know of a single person who has taken any action other than (occasionally) tweeting news articles, in this case). But also, I’m not sure whether I should just parse this as standard practice for great powers / ruling empires. The US has committed similarly bad atrocities in its history (slavery and the extermination/relocation of the Indians come to mind), and as far as I know, continues to commit similar atrocities. And the stakes are literally astronomical. Does the specter of extinction and the weight of all future earth-originating civilization mean we should just neglect contemorary genocide in our realpolitik calculations? I’m not comfortable with that, but I don’t know what to think about it.
  2. I don’t have a strong reason to expect that China’s institutions are fundamentally better functioning than the US’s, I think they’re just younger. If China is exhibiting the kind of functionality and decisiveness, that the US was enjoying 60 years ago, then it seems pretty plausible that 60 years from now (or maybe sooner than that, on the general principle that the world is moving faster now), the chinese system will be similarly sclerotic and dysfunctional.
    1. Indeed, we might make a more specific argument that institutions are able to remain functional so long as there is growth, because a growing pie means everyone can win. But when growth slows or stops there’s no longer a selection pressure for effectiveness, and institutions entrench themselves because rent seeking is a better strategy. (Or maybe the causality goes the other way: there’s a continual, gradual, increase in rent-seeking as actors entrench their power-bases, which gradually cuts out production, until all (or almost all) that’s left is rent-seeking. In any case, I think China has got to be nearing the top of it’s explosive s-curve, and I don’t expect its national agency to be robust to that.
  3. I would guess, not knowing much more than a stereotype of Chinese culture, that even if it is saner and more effective than western culture right now, the west has more of the generators that can lead to further increases in civilizational sanity. I might be totally off base here, but the East’s emphasis on conformity and social hierarchy seems like it would make it even MORE resistant to, say, the wide-scale adoption of prediction markets than the US is. (Though maybe the ruling party is enough of an unincentivized incentivizer to overcome this effect?) I suspect that it is even less likely to generate the kind of iconoclastic thinkers who would think up the idea of prediction markets in the first place. It would be quite bad if we got some boost in civilizational sanity with the rise of China, but that Chinese dominance curtained any further improvement on that dimension. 
  4. It is currently unclear to me how much it matters which culture the intelligence explosion takes place in.
    1. Under the assumption of a strong attractor in the human CEV, it seems like it doesn’t matter much at all: we’re all, currently, so radically confused about Goodness, that the apparently-huge cultural differences are just noise. And even if that’s not true, I would guess that the differences between my ideal future, and some human descended society, are probably massively outweighed by the looming probability of extinction and a sterile universe. Chinese people live happy lives in China, now, and have lived happy lives throughout history, even if they tolerate a level of conformity and restriction-of-expression that I would find stifling, to say the least.
    2. However, I think it might not be an exaggeration to say that the CPC believes that thoughts should be censored to serve the state. I can imagine technological augmented versions of thought control that are so severe as to permanently damage the human civilization’s ability to think together, which might constitute the sort of irreparable “damage” that prevent us from deliberating to discover and the executing on a good future. If this sort of technology is more likely to come from China than from the west, Chinese supremacy might be disastrous.
    3. It does seem really important that AGI not lock the future into an inescapable immortal dictatorship (Probably? Maybe most people just live basically happy lives in an immortal dictatorship?). And I want to track if that is more likely to result from an intelligence explosion directed by China than by my native culture.

Summing up

  • The problem facing humanity in this era is figuring out how to exit the acute risk period, systematically, instead of by luck. 
  • The only ways that I can see to do this, depend on aligned AI or a much saner human civilization. 
  • So the problem breaks down into two subproblems: solve AI alignment or achieve enough civilizational sanity.
  • AI alignment research is going apace, and if there are ways to speed it up, that would be great.
  • I can currently see four lines of attack on civilizational sanity: unblocking innovation in governance, intelligence enhancement, and possibly widespread trauma resolution, or Chineses ascendancy. 
  • All of those plans might turn out to be on-net bad for the world, on further reflection.


Some of my questions for going forward:

  1. How long until transformative AI arrives?
  2. Are there tractable ways to speed Technical AI alignment substantially?
  3. Are there tractable ways to unblock governance experimentation?
  4. Follow up on charter city projects
  5. What’s blocking sea steading? Is it cost as I believe?
  6. How large are the expected flow-through effects of governmental sanity interventions on other sectors?
  7. Conditional on unblocking innovation in governance, how long is it likely to take for the best innovations to propagate outward until they are standard best practices?
  8. What’s the bottleneck for human genetic intelligence augmentation
  9. Along what dimensions would Neuralink improve human capabilities?
  10. Is “trauma” a natural kind? To what extent is it true that psychological trauma is driving exploitative and counter-productive organizational patterns in the world?
  11. How much saner is China? How long will the Chinese system remain “alive”?
  12. How different will the long term future be, if the intelligence explosion happens in one culture rather than another?


[1] –  Or some civilization or other mechanism, bearing human values.

[2] –  Though of course, there isn’t a linear relationship between the individual progress bars, and total victory. We might be “70%” of the way to a full solution to both problems (whatever that means), but between the two, not have enough of the right pieces to get a combined solution that lets us exit the critical risk period. That’s why it is only allegorical.

[3] – And, in contrast, I sometimes talk with people who are so pessimistic about alignment work, that they take it for granted that the thing to do is take over the world by conventional means.

Psychoanalyzing, people seem to gravitate to the line of attack that is within their skillset, and therefore feels more comfortable to think about. This seems like a perfectly good heuristic for specialization, but it doesn’t seem like a particularly good way to identify which approach is more tractable in the abstract.

How do we prepare for final crunch time? – Some initial thoughts

[epistemic status: Brainstorming and first draft thoughts.

Inspired by something that Ruby Bloom wrote and the Paul Christiano episode of the 80,000 hours podcast.]

One claim I sometimes hear about AI alignment [paraphrase]:

“It is really hard to know what sorts of AI alignment work are good, this far out from transformative AI. As we get closer, we’ll have a clearer sense of what AGI / Transformative AI is likely to actually look like, and we’ll have much better traction on what kind of alignment work to do. In fact, it might be the case that MOST of the work of AI alignment is done in the final few years before AGI, when we’ve solved most of the hard capabilities problems already and we can work directly, with good feedback loops, on the sorts of systems that we want to align.”

Usually this is taken to mean that the alignment research that is being done today is primarily to enable or make easier future, more critical, alignment work. But “progress in the field” is only one dimension to consider in boosting the work of alignment researchers in final crunch time.

In this post I want to take the above posit seriously, and consider the implications. If most of the alignment work that will be done is going to be done in the final few years before the deadline, our job in 2021 is mostly to do everything that we can to enable the people working on the problem in the crucial period (which might be us, or our successors, or both) so that they are as well equipped as we can possibly make them.

What are all the ways that we can think of that we can prepare now, for our eventual final exam? What should we be investing in, to improve our efficacy in those final, crucial, years?

The following are some ideas.


For this to matter, our alignment researchers need to be at the cutting edge of AI capabilities, and they need to be positioned such that their work can actually be incorporated into AI systems as they are deployed.

A different kind of work

Most current AI alignment work is pretty abstract and theoretical, for two reasons. 

The first reason is a philosophical / methodological claim: There’s a fundamental “nearest unblocked strategy” / overfitting problem. Patches that correct clear and obvious alignment failures are unlikely to generalize fully, you’ll only have constrained unaligned optimization to channels that you can’t recognize. For this reason, some claim, we need to have an extremely robust, theoretical understanding of intelligence and alignment, ideally at the level of proofs.

The second reason is a practical consideration: we just don’t have powerful AI systems to work with, so there isn’t much in the way of tinkering and getting feedback.

The second objection becomes less relevant in final crunch time: in this scenario, we’ll have powerful systems 1) that will be built along the same lines as the systems that it is crucial to align and 2)  that will have enough intellectual capability to pose at least semi-realistic “creative” alignment failures (ie, current systems are so dumb, and liven in such constrained environments, that it isn’t clear how much we can learn about aligning literal superintelligences from them.)

And even if the first objection ultimately holds, theoretical understanding often (usually?) follows from practical engineering proficiency. It seems like it might be a fruitful path to tinker with semi-powerful systems trying out different alignment approaches empirically, and tinkering to discover new approaches, and then backing up to do robust theory-building given much richer data about what seems to work.

I could imagine sophisticated setups that enable this kind of tinkering and theory building. For instance, I imagine a setup that includes:

  • A “sandbox” that afford easy implementation of many different AI architectures and custom combinations of architectures, with a wide variety easy-to-create, easy-to-adjust, training schemes, and a full suite of interpretability tools. We could quickly try out different safety schemes, in different distributions, and observe what kinds of cognition and behavior result.
  • A meta AI that observes the sandbox, and all of the experiments therein, to learn general principles of alignment. We could use interpretability tools to use this AI as a “microscope” on the AI alignment problem itself, abstracting out patterns and dynamics that we couldn’t easily have teased out with only our own brains. This meta system might also play some role in designing the experiments to run in the sandbox, to allow it to get the best data to test it’s hypotheses.
  • A theorem prover that would formalize the properties and implications of those general alignment principles, to give us crisply specified alignment criteria by which we can evaluate AI designs.

Obviously, working with a full system like this is quite different than abstract, purely theoretical work on decision theory or logical uncertainty. It is closer to the sort of experiments that the OpenAI and Deep Mind safety teams have published, but even that is a pretty far cry from the kind of rapid-feedback tinkering that I’m pointing at here.

Given that the kind of work that leads to research progress might be very different in final crunch time than it is now, it seems worth trying to forecast what shape that work will take and trying to see if there are ways to practice doing that kind of work before final crunch time.


Obviously, when we get to final crunch time, we don’t want to have to spend any time studying fields that we could have studied in the lead-up years. We want to have already learned all the information and ways of thinking that we’ll want to know, then. It seems worth considering what fields we’ll wish we had known when time comes.

The obvious contenders:

  • Machine Learning
  • Machine Learning interpretability
  • All the Math of Intelligence that humanity has yet amassed [Probability theory Causality, etc.]

Some less obvious possibilities:

  • Neuroscience?
  • Geopolitics, if it turns out that which technical approach is ideal hinges on important facts about the balance of power?
  • Computer security?
  • Mechanism design in general?

Research methodology / Scientific “rationality”

We want the research teams tackling this problem in final crunch time to have the best scientific methodology and the best cognitive tools / habits for making research progress, that we can manage to provide them.

This maybe includes skills or methods in the domains of:

  • Ways to notice as early as possible if you’re following an ultimately-fruitless research path
  • Noticing / Resolving /Avoiding blindspots
  • Effective research teams
  • Original seeing / overcoming theory blindness / hypothesis generation
  • ???


One obvious thing is to spend time now, investing in habits and strategies for effective productivity. It seems senseless to waste precious hours in our acute crunch time due to procrastination or poor sleep. It is well worth in to solve those problems now. But aside from the general suggestion to get your shit in order and develop good habits now I can think of two more specific things that seem good to do.

Practice no-cost-too-large productive periods

There maybe trades that could make people more productive on the margin, but are too expensive in regular life. For instance, I think that I might conceivably benefit from having a dedicated person who’s job is to always be near me, so that I can duck with them with 0 friction. I’ve experimented a little bit with similar ideas (like having a list of people on call to duck with), but it doesn’t seem worth it for me to pay a whole extra person-salary to have the person be on call, and in the same building, instead of on-call via zoom.

But it is worth it at final crunch time.

It might be worth it to spend some period of time, maybe a week, maybe a month, every year, optimizing unrestrainedly for research productivity, with no heed to cost at all, so that we can practice how to do that. This is possibly a good thing to do anyway, because it might uncover trades that actually, on reflection are worth importing into my regular life.

Optimize rest

One particular subset of personal productivity, that jumps out at me: each person should figure out their actual optimal cadence of rest.

There’s a failure mode that ambitious people commonly fall into, which is working past the point when marginal hours of work are negative. When the whole cosmic endowment is on the line, there will be a natural temptation to push yourself to work as hard as you can, and forgo rest. Obviously, this is a mistake. Rest isn’t just a luxury: it is one of the inputs to productive work.

There is a second level of this error in which one, grudgingly, takes the minimal amount of rest time, and gets back to work. But the amount of rest time required to stay functional is not the optimal amount of rest, the amount the maximizes productive output. Eliezer mused years ago, that he felt kind of guilty about it, but maybe he should actually take two days off between research days, because the quality of his research seemed better on days when he happened to have had two rest days preceding.

In final crunch time, we want everyone to be resting the optimal amount that actually maximizes area under the curve, not the one that maximizes work-hours. We should do binary search now, to figure out what the optimum is.

Also, obviously, we should explore to discover highly effective methods of rest, instead of doing whatever random things seem good (unless, as it turns out, “whatever random thing seems good” is actually the best way to rest).

Picking up new tools

One thing that will be happening in this time, is there will be a flurry of new AI tools that can radically transform thinking and research, perhaps increasingly radical tools coming at a rate of once a month or faster.

Being able to take advantage of those tools and start using them for research immediately, with minimal learning curve, seems extremely high leverage.

If there are things that we can do that increase the ease of picking up new tools and using them to their full potential (instead of, as is common, using only the features afforded by your old tools and only very gradually

Some thoughts (probably bad):

  • Could we set up our workflows, somehow, such that it is easy to integrate new tools into them? Like if you already have a flexible, expressive research interface (something like Roam?), and you’re used to regular changes in capability to the backed of the interface?
  • Can we just practice? Can we have a competitive game of introducing new tools, and trying to orient to them and figure out how to exploit them creatively as possible?
  • Probably it should be some people’s full time job to translate cutting edge developments in AI into useful tools and practical workflows, and then to teach those workflows to the researchers?
  • Can we design a meta-tool that helps us figure out how to exploit new tools? Is it possible to train an AI assistant specifically for helping us get the most out of our new AI tools?
  • Can we map out the sort of constraints on human thinking and/or the the sorts of tools that will be possible, in advance, so that we can practice with much weaker versions of those tools, and get a sense of how we would use them, so that we’re ready when they arrive?
  • Can we try out new tools on psychedelics, to boost neuroplasticity? Is there some other way to temporarily weaken our neural priors? Maybe some kind of training in original seeing?

Staying grounded and stable in spite of the stakes

Obviously, being one of the few hundred people on whom the whole future of the cosmos rests, while the singularity is happening around you, and you are confronted with the stark reality of how doomed we are, is scary and disorienting and destabilizing.

I imagine that that induces all kinds of psychological pressures, that might find release in any of a number of concerning outlets: by deluding one’s self about the situation, by becoming manic and frenetic, by sinking into immovable depression.

We need our people to have the virtue of being able to look the problem in the eye, with all of its terror and disorientation, and stay stable enough to make tough calls, and make them sanely.

We’re called to cultivate a virtue (or maybe a set of virtues) of which I don’t know the true name, but which involve courage and groundless, and determination-without-denial.

I don’t know what is entailed in cultivating that virtue. Perhaps meditation? Maybe testing one’s self at literal risk to one’s life? I would guess that people in other times and places, who needed to face risk to their own lives and that of their families, did have this virtue, or some part of it, and it might be fruitful to investigate those cultures and how that virtue was cultivated.


Tinder hookups displace hookups that are more likely to lead to relationships?

[epistemic status: completely unverified hypothesis straight out of my ass. Many of these “facts” are subjective impressions that may turn out to be just untrue. Very sloppy fact checking.]

According to the Atlantic, we’re currently in the midst of a sex recession. Fewer people are having sex, and those that are are having less of it, in comparison to previous decades.

Furthermore, it looks to me that many in my generation are not on track to have kids and raise a family. [Note: that might be totally false. For one thing, people are marrying and having kids later, so maybe I just need to wait a half decade. There are articles saying that the birthrate is falling, but just eyeballing the graph, it looks like it has been hovering around 2 births per woman since 1972.]

I have the impression that fewer romantic pairings are happening. Fewer people are ending up in romantic relationships. Why might that be?

In a word, Tinder.

In 2012, Tinder was launched, and by the mid 2010s it had reached fixation. I posit that maybe swipe-based mobile apps had a number of large scale societal impacts that we’re just starting to see. Namely, that hook-up pairings that result from tinder like apps, are less likely to lead to long term relationships.

Pairing with people in your local social context increases the foder for a robust long term relationship

Before tinder, people hooked up with people in their local social environment: you met people in your dorm, or in the classes you were taking, or at your workplace, or in a bar, or at a party, or through friends.

But it seems that the popularity of tinder-like apps must have displaced at least some of that activity. Now, if you want to hook up, you’re more likely to do it via an app.

I would guess that if you hook up with someone who lives in the same dorm as you, that hookup is much more likely to transition to an ongoing relationship. If you’re hooking up with someone that you’ve already interacted with a good deal, there’s the possibility of being attracted to a partner on the basis of attributes like shared interests, or positive character traits like generosity or humor. In contrast, on Tinder, approximately the only criteria for mate-evaluation are 1) looks (as embodied in a photo) and 2) chat game.

Hooking up with someone that you like from your interactions with them seems much more likely to lead to a long term relationship, compared to hooking up with someone almost solely on the basis of their photo. Having met in person, the two of you are more likely to share a lot of common context (similar interests, overlapping social group, similar priorities), which can turn a recurring hookup into an actual relationship.

Heavy power laws of sexual success push against relationships

Secondly, tinder aggravates a power law distribution of male sexual success. There have always been “Chads”, who were particularly attractive to women. Men are, in general, less discriminating about sexual partners than women are, so those men would have casual sex with many partners. Many women are pairing with a few men, resulting in a sex-pairings graph with a small number of super-connectors, and a larger number of unconnected or loosely connected nodes (less attractive-to-women men, who are having much less sex than the population average).

However, I suspect that tinder-like apps further consolidate that distribution of sexual success.

Tinder has a much larger pool to filter than in-person context. Women, flipping through tinder, can choose to mach with only the very most attractive guys. In contrast, if they were going to a bar to hook up, they would, at best, be able to hook up with only the most attractive guy at that particular bar. And even then, only one woman at a bar could pair with the most attractive man at that bar at any given time, whereas on Tinder, a very successful guy could match with multiple women in a day, and have sex with all of them in sequence.

Hookups with highly sexually-successful chads seem unlikely to transition to long term relationships, because for those men, the opportunity cost of monogamy is much larger.

Additionally, even if those men do want to transition to a long-term relationship, they would only do it with (approximately) one out of hundreds of women. So, for most women hooking up with a hyper-sexually-successful man, the likelihood of that hookup transitioning into a relationship is very low.

Which means that tinder allows women, as a whole, to hook up with the same number of men as women 2 decades ago, and on average, their partners are more attractive to them, but less likely to pair-bond with them in a long term relationship.

Since the transaction costs for having sex are lower when you’re already in a relationship, fewer people ending up in relationships means that there is less sex happening overall. And fewer relationships means fewer people getting married and having kids.

Some predictions that this model makes and other hypotheses


  • We should see that the power-law of sexual success for men has moved to be closer to winner-take-all since 2012. More men are having no sex or close to no sex, but the men who are having a lot of sex are having more of it than their peers-from-the-previous-cohort (is there a words for this?).
  • Fewer hookups are happening via the “traditional channels”
    • As a corollary of this, men must either be asking women out in person less, or women must be saying “yes” (to sex, if not to dates) less, or both.

Everything that I’m saying here is also compatible with the hypothesis:

“Most men and women mostly aim for casual sex in their 20s, and steer toward looking for longer term relationships and marriage partners in their 30s. Tinder has all of these impacts in the first phase, but doesn’t influence the second phase much.”

This is possible, I suppose, but given the common trope of couples that met in college, it seem like over the past 50 years, long term committed relationships have evolved from more casual relationships, which I think often start as hookups.

This tweet from the author of a paper about how people meet their partners does seem to match this story.

It looks like online dating is displacing “meeting through friends”, “meeting through / as coworkers”, and “meeting in college.”

Social reasoning about two clusters of smart people

Here’s a sketch. All of the following are generalizations, and some are wrong.

There are rationalists.

The rationalists are unusually intelligent, even I think, for the tech culture that is their sort of backdrop. But they are, by-and-large kind of aspy: on the whole, they are weak on social skills, or their is something broken about their social perceptions (broken in a different way for each one).

Rationalists rely heavily on explicit reasoning, and usually start their journeys pretty disconnected from their bodies.

They are strongly mistake theorists.

They have very very strong STEM-y epidemics. They can follow, and are compelled by arguments. They are masterful at weighing evidence and coming to good conclusions on uncertain questions, where the there is something like a data-set or academic evidence base.

They are honest.

They generally have a good deal of trust and assumption of good faith about other people, or they are cynical of humans and human behavior, using (explicit) models of “signaling” and “evo pysch.”

I think they maybe have a collective blindspot with regards to Power, and are maybe(?) gullible (related to the general assumption about good faith). I suspect that rationalists might find it hard to generate the hypothesis that “this real person right in front of me, right now, is lying to me / trying to manipulate me.”

They are, generally, concerned about ex-risk from advanced AI, and track that as the “most likely thing to kill us all”.


There’s also this other cluster of smart people. This includes Leverage-people, and some Thiel people, and some who call themseleves post rationalists.

They are more “humanities” leaning. They probably think that lots of classic philosophy is not only good, but practically useful (where some rationalists would be apt to deride that as the “rambling of dead fools”).

They are more likely to study history or sociology, than math or Machine Learning.

They are keenly aware of the importance of power and power relations, and are better able to take ideology as object, and treat speech as strategic action rather than mere representation of belief.

Their worldview emphasizes “skill”, and extremely skilled people, who shape the world.

They are more likely to think of “beliefs” as having a proper function doing something other than reflecting the true state of the world, for instance, facilitating coordination, or producing an effective psychology. The rationalist would think of instrumentally useful false beliefs as something that is kind of dirty.

They tend to get some factual questions wrong (as near as I can tell): one common one is disregarding IQ, and positing that all mental abilities are a matter of learning.

These people are much more likely to think that institutional decay or civilizational collapse is more pressing than AI.


It seems like both these groups have blindspots, but I would really like to have a better sense of the likelyhood of both of these disasters, so it would be good if we could get all the virtues into one place, to look at both of them.



A view of the main kinds of problems facing us

I’ve decided that I want to to make more of a point to write down my macro-strategic thoughts, because writing things down often produces new insights and refinements, and so that other folks can engage with.

This is one frame or lens that I tend to think with a lot. This might be more of a lens or a model-let than a full break-down.

There are two broad classes of problems that we need to solve: we have some pre-paradigmatic science to figure out, and we have have the problem of civilizational sanity.

Preparadigmatic science

There are a number of hard scientific or scientific-philosophical problems that we’re facing down as a species.

Most notably, the problem of AI alignment, but also finding technical solutions to various risks caused by bio-techinlogy, possibly getting our bearings with regards to what civilization collapse means and how it is likely to come about, possibly getting a handle on the risk of a simulation shut-down, possibly making sense of the large scale cultural, political, cognitive shifts that are likely to follow from new technologies that disrupt existing social systems (like VR?).

Basically, for every x-risk, and every big shift to human civilization, there is work to be done even making sense of the situation, and framing the problem.

As this work progresses it eventually transitions into incremental science / engineering, as the problems are clarified and specified, and the good methodologies for attacking those problems solidify.

(Work on bio-risk, might already be in this phase. And I think that work towards human genetic enhancement is basically incremental science.)

To my rough intuitions, it seems like these problems, in order of pressingness are:

  1. AI alignment
  2. Bio-risk
  3. Human genetic enhancement
  4. Social, political, civilizational collapse

…where that ranking is mostly determined by which one will have a very large impact on the world first.

So there’s the object-level work of just trying to make progress on these puzzles, plus a bunch of support work for doing that object level work.

The support work includes

  • Operations that makes the research machines run (ex: MIRI ops)
  • Recruitment (and acclimation) of people who can do this kind of work (ex: CFAR)
  • Creating and maintaining infrastructure that enables intellectually fruitful conversations (ex: LessWrong)
  • Developing methodology for making progress on the problems (ex: CFAR, a little, but in practice I think that this basically has to be done by the people trying to do the object level work.)
  • Other stuff.

So we have a whole ecosystem of folks who are supporting this preparadgimatic development.

Civilizational Sanity

I think that in most worlds, if we completely succeeded at the pre-paradigmatic science, and the incremental science and engineering that follows it, the world still wouldn’t be saved.

Broadly, one way or the other, there are huge technological and social changes heading our way, and human decision makers are going to decide how to respond to those changes, possibly in ways that will have very long term repercussions on the trajectory of earth-originating life.

As a central example, if we more-or-less-completly solved AI alignment, from a full theory of agent-foundations, all the way down to the specific implementation, we would still find ourselves in a world, where humanity has attained god-like power over the universe, which we could very well abuse, and end up with a much much worse future than we might otherwise have had. And by default, I don’t expect humanity to refrain from using new capabilities rashly and unwisely.

Completely solving alignment does give us a big leg up on this problem, because we’ll have the aid of superintelligent assistants in our decision making, or we might just have an AI system implement our CEV in classic fashion.

I would say that “aligned superintelligent assistants” and “AIs implementing CEV”, are civilizational sanity interventions: technologies or institutions that help humanity’s high level decision-makers to make wise decisions in response to huge changes that, by default, they will not comprehend.

I gave some examples of possible Civ Sanity interventions here.

Also, think that some forms of governance / policy work that OpenPhil, OpenAI, and FHI have done, count as part of this category, though I want to cleanly distinguish between pushing for object-level policy proposals that you’ve already figured out, and instantiating systems that make it more likely that good policies will be reached and acted upon in general.

Overall, this class of interventions seems neglected by our community, compared to doing and supporting preparadigmatic research. That might be justified. There’s reason to think that we are well equipped to make progress on hard important research problems, but changing the way the world works, seems like it might be harder on some absolute scale, or less suited to our abilities.