April 2021 – musings and rough drafts

Please note that while I make claims about what I understand other people to believe, I don’t speak for them, and might be mistaken. I don’t represent for MIRI or CFAR, and they might disagree. The opinions expressed are my own.

Someone asked me why I am broadly “pessimistic”. This post, which is an articulation of an important part of my world view, came out of my answer to that question.

Two kinds of problems:

When I’m thinking about world scale problems, I think it makes sense to break them down into two categories. These categories are nebulous (like all categories in the real world), but I also think that these are natural clusters, and most world scale problems will clearly be in one or the other.

Category I Problems

Let’s say I ask myself “Will factory farming have been eliminated 100 years from now?” (Setting aside x-risk for a moment) I can give a pretty confident “Yes.”

It looks to me that we have a pretty clear understanding of what needs to happen to eliminate factory farming. Basically, we need some form of synthetic meat (and other animal products) that is comparable to real meat in taste and nutrition, and that costs less than real meat to produce. Once you have that, most people will prefer synthetic meat to real meat, demand for animal products will drop off markedly, and the incentive for large scale factory farming will vanish. At that point the problem of factory farming is basically solved.

Beyond and Impossible meat is already pretty good, maybe even good enough to satisfy the typical consumer. That gives us proof of concept. So eliminating factory farming it is almost just a matter of making the cost of synthetic meat < the cost of real meat.

There’s an ecosystem of companies (notably Impossible and Beyond, plus some cultured meat startups) that are iteratively making progress on exactly the problem of driving down the cost. Possibly, it will take longer than I expect, but unless something surprising happens, that ecosystem will succeed eventually.

In my terminology, I would say that there exists a progress–machine that is systematically ratcheting towards a solution to factory farming. There is a system that is making incremental progress towards the goal, month by month, year by year. If things continue as they are, and there aren’t any yet-unseen fundamental blockers, I have strong reason to think that the problem will be fully solved eventually.

Now, this doesn’t mean that the problem is already solved, figuring out ways to make this machine succeed marginally faster is still really high value, because approximately a billion life-years are spent in torturous conditions in factory farms every year. If you can spend your career speeding up the machine by six months, that prevents at least a half a billion life-years of torture (more than that if you shift the rate of progress instead of just translating the curve to the left).

Factory farming is an example of what (for lack of a better term), I will call a “category I” problem. [tentative name “turn-crank problems”]

Category II Problems

In contrast, suppose I ask “Will war have been eliminated 100 years from now?” To this question, my answer has to be “I don’t know, but probably not?”

That’s not because I think that ending war is impossible. I’m pretty confident that there exists some set of knowledge (of institution design? Of game theory? Of diplomacy?) with which we could construct a system that robustly avoided war, forever.

But in my current epistemic state, I don’t even have a sketch of how to do it. There isn’t a well specified target that if we hit it, we’ll have achieved victory (in the way that “cost of synthetic meat < cost of real meat” is a victory criterion). And there is no machine that is systematically ratcheting towards progress on eliminating war.

That isn’t to say that there aren’t people working on all kinds of projects which are trying to help with the problem of “war”, or reducing the incidence of war (peace talks, education, better international communication, what have you). There are many people, in some sense, working hard at the problem.

But their efforts don’t cohere into a mechanism that reliably produces incremental progress towards solving the problem. Or said differently, there are things that people can do that help with the problem on the margin, but those marginal improvements don’t “add up to” a full solution. It isn’t the case that if we do enough peace talks and enough education and enough translation software, war will be permanently solved.

In a very important sense, humanity does not have traction on the problem of ending war, in a way that it does have traction on the problem of ending factory farming.

Ending war isn’t impossible in principle, but there are currently no levers that we can pull to make systematic progress towards a solution. We’re stuck doing things that might help a little on the margin, but we don’t know how to set up a system such that if that machine runs long enough, war will be permanently solved.

I’m going to call problems like this, where there does not exist a machine that is making systematic progress, a “category II” problem. [tentative name “non-turn-crank problems”]

Category I vs. Category II

Some category I problems:

Factory Farming
Global Poverty
“Getting off the rock” (but possibly only because Elon Musk was born, and took personal responsibility for it)
Getting a computer to almost every human on earth
Legalizing / normalizing gay marriage
Eradicating Malaria
Possibly solving aging?? (and if so, probably only because a few people in the transhumanism crowd took personal responsibility for it)

Some Category II problems

War
AI alignment
Global coordination
Civilizational sanity / institutional decision making (on the timescale of the next century)
Civilizational collapse
Achieving stable, widespread mental health
Political polarization
Cost disease

If you can draw a graph that shows the problem more-or-less steadily getting better over time, it’s a category I problem. If progress is being made in some sense, but progress happens in stochastic jumps, or it’s non-obvious how much progress was made in a particular period, it’s a category II problem.

In order for factory farming to not be solved, something surprising, from outside of my current model, needs to happen. (Like maybe technological progress stopping, or a religious movement that shifts the direction of society.)

Whereas in order for war to be solved, something surprising needs to happen. Namely, there needs to be some breakthrough, a fundamental change in our understanding of the problem, that gives humanity traction on the problem, and enables the construction of a machine that can make systematic, incremental progress.

(Occasionally, a problem will be in an inbetween category, where there doesn’t yet exist a machine that is making reliable progress on the problem, but that isn’t because our understanding of the shape of the problem is insufficient. Sometimes the only reason why there isn’t a machine doing this work is only because no person or group, of sufficient competence, has taken heroic responsibility for getting a machine started.

For instance, I would guess that our civilization is capable of making steady, incremental progress on the effectiveness of cryonics, up to the point cryonics being a reliable, functional, technology. But progress on “better cryonics” is mostly stagnant. I think that the only reason there isn’t a machine incrementally pushing on making cryonics work is that no one (or at least no one of sufficient skill) has taken it upon themselves to solve that problem, thereby jumpstarting a machine that makes steady incremental progress on it. [ 1 ]

It is good to keep in mind that these category 1.5 problems exist, but they mostly don’t bear on the rest of this analysis.)

Maybe the most important thing: Category I problems get solved as a matter of course. Category II problems get solved after we stumble into a breakthrough that turns them into category I problems.

Where in this context, a “breakthrough” is when “some change either in our understanding (the map) or in the world (the territory) that causes the shape of the problem to shift, such that humanity can now make reliable systematic progress towards a solution, unless something surprising happens.”

Properties of Category I vs. Category II problems

Category I problems	Category II problems
There exists a “machine” that is making systematic, reliable progress on the problem	There isn’t a “machine” making systematic, reliable progress on the problem, and humanity doesn’t yet know how to make such a machine
Marginal improvements can “add up” to a full solution to the problem	Marginal improvements don’t “add up” to a full solution to the problem
The problem will be solved, unless something surprising happens	The problem won’t be solved until something surprising happens
We’re mostly not depending on luck	We’re substantially depending on luck
Progress is usually incremental	Progress is usually stochastic
Progress is pretty “concrete”; it is relatively unlikely that some promising project will turn out to be completely useless	Progress is “speculative”; it is a live possibility that any given pieces of work that we consider progress will later prove completely useless in light of better understanding
The bottleneck to solving the problem is the machine going better or faster	The bottleneck to solving the problem is a breakthrough, defined as “some shift (either in our map or in the territory) that changes the shape of the problem enough that we can make reliable systematic progress on it”
Solving the problem does not require any major conceptual or ontological shifts; progress consists of many, constrained, engineering problems	Our understanding of the problem, or ontology of the problem, will change at least once, but most likely many times, on the path to a full solution
There might be graphs that show the problem getting better, more-or-less steadily, over time.	It’s quite hard to assess how much progress is made in a given unit of time, or even if exciting “milestones” actually constitute progress
We know how to train people to fill roles in which they can reliably contribute to progress on the problem, mostly what is needed is effective people to fill those roles	We have only shaky and tenuous knowledge of how to train people to make progress on the problem; mostly what’s needed is people who can figure out for themselves how to get orientation for themselves
If all the relevant inputs were free, the problem would be solved or very close to solved. (eg with a perpetual motion machine that produced arbitrarily large amounts of the ingredients to impossible meat, factory farming would be over)	If the inputs were free, this would not solve the problem (eg with a hypercomputer, AI safety would not be solved)

Properties of Category I vs Category II problems

Luck and Deadlines

In general, I’m optimistic about category I problems, and pessimistic about category II problems (at least in the short term).

And crucially, humanity doesn’t know how to systematically make progress on intentionally turning a given category II problem into a category I problem.

We’re not hopeless at it. It is not completely a random walk. Individual geniuses have sometimes applied themselves to a problem that we were so confused about as to not have traction on it, and produced a breakthrough that gives us traction. For instance, Newton giving birth to mathematicized physics. [Note: I’m not sure that this characterization is historically correct.]

But overall, when those breakthroughs occur, it tends to be in large part due to chance. We mostly don’t know how to make breakthroughs, on particular category II problems, on demand.

Which is to say, any given problem transitioning from II to I depends on luck.

And unfortunately, it seems like some of the category II problems I listed above 1) are crucial to the survival of civilization, and 2) have deadlines.

It looks like, from my epistemic vantage point, that if we don’t solve some subset of those problems before some unknown deadline (possibly as soon as this century), we die. That’s it. Game over.

Human survival depends on solving some problems for which we currently have no levers. There is nothing that we can push on to make reliable, systematic progress. And there’s no machine making reliable, systematic progress.

Absent a machine that is systematically ratcheting forward progress on the problem, there’s no strong reason to think that it will be solved.

Or to state the implicit claim more explicitly:

Large scale problems are solved only when there is a machine incrementally moving towards a solution.
There are a handful of large scale problems that seem crucial to solve in the medium term.
There aren’t machines incrementally moving towards solutions to those problems.

So by default, unless something changes, I expect that those problems won’t be solved.

On AI alignment

There are people who dispute that AI risk is a category II problem, and they are accordingly more optimistic. I believe that Rohin Shah and Paul Christiano both think that there’s a pretty good chance that business-as-usual AI development will solve alignment as a matter of course, because alignment problems are on the critical path to making functioning AI.

That is, they think that there is an existing machine that is solving the problem: the AI/ML field itself.

If I understand them correctly, they both think that there is a decent chance that their EA-focused alignment work will have been counterfactually irrelevant in retrospect, but it still seems like a good altruistic bet to lay groundwork for alignment research now.

In the terms of my ontology here, they see themselves as shoring up the alignment-progress machine, or helping it along with some differential progress, just in case the machine turns out to have been inadequate to the task of solving the problem before the deadline. Even though they think that there is a substantial chance that their work will, in retrospect, turn out to have been counterfactually irrelevant, because getting the AI alignment problem right seems really high leverage for the value of the future, it is a good altruistic bet to do work that makes it more likely that the machine will succeed, on the margin.

This is in marked contrast to how I imagine the MIRI leadership is orienting to the problem: When they look at the world, it looks to them like there isn’t a machine ratcheting towards safety at all. Yes, there are some alignment-like problems that will be solved in the course of AI development, but largely by patches that invite nearest-unblocked strategy problems, and which won’t generalize to extremely powerful systems. As such, MIRI is [edit 2023: was] making a desperate effort to make or to be a machine that ratchets toward progress on safety.

I think this question of “is there a machine that is ratcheting the world towards more AI safety”, is one of the main cruxes between the non-MIRI optimists, and the MIRI-pessimists, which is often overshadowed by the related, but less crucial question of “how sharp will takeoff be?”

On “rationality”

Over the past 3 years, I have regularly taught at AIRCS workshops. These are mainly a recruitment vehicle for MIRI, run as a collaboration between MIRI and CFAR.

At AIRCS workshops, one thing that we say early on is that AI safety is a “Preparadigmatic field”, which is more or less the same as saying that AI alignment is a category II problem. AI safety as a field hasn’t matured to the point that there are known levers for making progress.

And we, explain, we’re going to teach some rationality techniques at the workshop, because those are supposed to help one orient to a paradigmatic field.

Some people are skeptical that these funny rationality methods are helpful at all (which, to be clear, I think is a quite reasonable position). Other people give the opposite critique, “it seems like clear thinking and effective communication and generally making use of all your mind’s abilities, is useful in all fields, not just preparadigmatic ones.”

But this is missing the point slightly. It isn’t so much that these tools are particularly helpful for prepardigmatic fields, it’s that in preparadigmatic fields, this is the best we can provide.

More developed fields have standard methods that are known to be useful. We train aspiring physicists in calculus, because we have ample evidence that calculus is an extremely versatile tool for making progress on physics, for example. [another example would be helpful here]

We don’t have anything like that for AI safety. There are not yet standard tools in the AI safety toolbox that we know to be useful and that everyone should learn. We don’t have that much traction on the problems.

So as a backstop, we teach very general principles of thinking and problem solving, as a sort of “on ramp” to thinking about your own thinking and how you might improve your own process. The hope is that will translate into skill in getting traction on a confusing domain that doesn’t yet have standard methods.

When you’re flailing, and you don’t have any kind of formula for making research progress, it can make sense to go meta and think about how to think about how to solve problems. But if you can just make progress on the object level, you’re almost certainly better off doing that.

People sometimes show up at AIRCS workshops expecting us to give them concrete technical problems that they can try and solve, and are sometimes discouraged that instead we’re teaching these woo-y or speculative psychological techniques.

But, by and large, we DON’T have concrete, well-specified, technical problems to solve (note that this is a somewhat contentious claim, see the bit about AI safety above). The work that needs to be done is something like “wandering around in one’s confusion in such a way that one can crystalize well specified technical problems.” And how to do that is very personal and idiosyncratic: we don’t have systematized methods for doing that, such that someone can just follow the instructions and get the desired result. But we’ve found that the woo-y tools seem to give people new levers and new perspectives for figuring out how to do this for themselves, so that’s what we have to share.

As a side note: I have a gripe that “rationality” has come to conflate two things, there’s “how to make progress on natural philosophy when you don’t have traction on the problem” and separately, there’s “effective decision-making in the real world”. These two things have some overlap, but they are really pretty different things. And I think that development on both of them has been hurt by lumping them under one label.

On EA

If I were to offer a critique of Effective Altruism it would be this: EA in general doesn’t distinguish between category I and category II problems.

Of course, this doesn’t apply to every person who is affiliated with EA. But many EAs, and especially EA movement builders, implicitly think of all problems as class I problems. That is, they are implicitly behaving as if there exists a machine that will convert resources (talent, money, attention) into progress on important problems.

And, as I said, there are many problems for which there does exist a machine doing that. But in cases where there isn’t such a machine, because the crucial breakthrough that would turn the problem from category II to category I hasn’t occurred yet, this is counterproductive.

The sort of inputs that allow a category I problem-solving machine to go better or faster, are just very different from the sort of inputs that make it more likely that humanity will get traction on a category II problem.

Ease of Absorbing Talent

For one thing, more people is often helpful for solving a category I problem, but is usually not helpful for getting traction on a category II problem. Machines solving category I problems can typically absorb people, because (by dint of having traction), they are able to train people to fill useful roles in the machine.

Folks trying to get traction on a category II problem, by definition, don’t have systematic methods by which they can make progress. So they can’t easily train people to do known-to-be-useful work.

I think there are clusters that are able to make non-0 progress on getting traction, and that those clusters can sometimes usefully absorb people, but they basically need to be people that have a non-trivial ability to get traction themselves. Because the work to be done is trying to get traction on the problem, it doesn’t help much to have more people who are waiting to be told what to do: the thing that they need to do is figure out what to do. [ 2 ]

Benefit of Marginal Improvements

For another thing, because machines solving category I problems can generally absorb resources in a useful way, and because they are making incremental progress, it can be useful to nudge people’s actions in the direction of a good thing, without them shifting all the way to doing the “optimal” thing.

Maybe someone won’t go vegan, but they might go vegetarian. Or maybe a company won’t go vegetarian, but it can be persuaded to use humanely farmed eggs.
Maybe this person won’t change their whole career-choice, but they would be open to choosing more impact oriented projects within their career.
Maybe most people won’t become hard-core EAs, but if many people change their donations to be more effective on the margin, that seems like a win.
Maybe an AI development company won’t hold back the deployment of their AI system for years, and find a way to insure that it is aligned, but it can be convinced to hire a safety team.

For category I problems interventions on the margin “add up to” significant progress on the problem. What a category I problem means is that there are at least some forms of marginal improvement that, in aggregate, solve the problem.

But in the domain of category II problems, marginal nudges like this are close to useless.

Because there is not a machine, to which people can contribute, that will incrementally make progress on the problem, getting people to be somewhat more aware of the problem, or care a little about the problem, doesn’t do anything meaningful.

In the domain of a category II problem, the thing that is needed is a breakthrough (or a series of breakthroughs) that will turn it into a category I problem.

I don’t know how to make this happen in full generality, but it looks a lot closer to a small number of highly talented, highly-invested people who are working obsessively on the problem than it looks like a large mass of people who are aware that the problem is important and will make marginal changes to their lives to help.

A machine learning researcher who is not interested in really engaging with the hard part of the problem of AI safety, because that would require backchaining from bizarre-seeming science-fiction scenarios, but is working on a career-friendly paper that he has rationalized, by way of some hand-wavy logic as, “maybe being relevant to AI safety someday”, is, it seems to me, quite unlikely to uncover some crucial insight that leads to a breakthrough on the problem.

Even a researcher who is sincerely trying to help with AI safety, whose impact model is “I will get a PhD, and then publish papers about AI safety” is, according to me, extremely unlikely to produce a breakthrough. Such a person is acting as if there is a role that they can fill, and if they execute well in that role, this will make progress on the problem. They’re acting as if they are part of an existing solution machine.

But, as discussed, that’s not the way it works: we don’t know how to make progress on AI safety, there aren’t straightforward things to do that will reliably make progress. The thing that is needed is people who will take responsibility for independently attempting to figure out how to make progress (which, incidentally, involves tackling the whole problem in its entirety).

If a person is not thinking at all about how to get traction on the core problem, but is just doing “the sort of activities that an AI safety researcher” does, I think it is very unlikely that their activity is going to lead to the kind of groundbreaking work that changes our understanding of the problem enough that AI alignment flips from being in category II to being in category I.

In these category II cases, shifts in behavior, on the margin, do not add up to progress on the problem.

EA’s causal history

Not making a distinction between category I and category II problems, EA as a culture, has a tendency to try and solve all problems as if they are category I problems, namely by recruiting more people and directing them at AI alignment or AI policy or whatever.

This isn’t that surprising to me, because I think a large part of the memetic energy of EA came from the identification of a specific category I problem: There was a claim that a first world person could easily save lives by donating to effective charities focused on health interventions in the third world.

Insofar as this is true, there’s a strong reason to recruit as many people to be engaged with EA as possible: the more people involved, the more money moved, the more lives saved.

However, in the years since EA was founded, the intellectual core of the movement updated in two ways:

Firstly, it now seems more dubious (to me at least) that lives can be cheaply and easily saved that way. (I’m much less confident of this point, and it isn’t a crux for me, so I’ve removed discussion of it to an endnote.[ 3 ] )

And more importantly, EA realized that there are vastly more important problems than global poverty.

X-risk has been a thread in EA discourse from the beginning: one of the three main intellectual origins of EA was LessWrong (the other two being GiveWell coming from the finance world, and Giving What We Can / 80,000 hours stemming from some Oxford philosophers). But sometime around 2014 the leadership of EA settled on long-termism and x-risk as the “most important thing”. (I was part of the volunteer team for EAG 2015, and I saw firsthand that there was an explicit push to prioritize x-risk.)

Over recent years that shift has taken form: deprioritizing earning to give, but promoting careers in AI policy, for instance.

I claim that this pivot represents a more fundamental shift than most in EA realize. Namely, a shift from EA being the sort of thing that is attempting to make progress on a category I problem to EA being the sort of thing attempting to make progress on a category II problem.

EA developed as a movement for making progress on a category I problem: It had as a premise that ordinary people can do a lot of good by moving money to (mostly) pre-existing charities, and by choosing high impact “careers” (where a “career” implies an already-mapped out trajectory). Category I orientation is implicit in EA’s DNA.

And so when the movement tries to make the pivot to focusing on x-risk, it implicitly orients to that problem as if it were a category I problem. “Where can we direct money and talent, to make impact on x-risk”.

For all of the reasons above, I think this is a fundamental error: the inputs that lead to progress on a category I problem are categorically different than those that lead to progress on a category II problem.

To state my view starkly, if somewhat uncharitably, EA is essentially, shoveling resources into a non-existent x-risk progress machine, imagining that if they shovel enough, this will lead to progress on x-risk, and the other core problems of the world. As I have said, I think that there isn’t yet a machine that can make consistent incremental progress in that way.

But it would be pretty hard, I think, for EA to do something different. This isn’t a trivial error to correct: changing this would entail changing the fundamental nature of what EA is and how it orients to the world.

Footnotes:

[ 1 ] – I’ve heard Nate Sores refer to “the curse of Cryonics” which is that anyone who has enough clear thinking independent thought to realize that cryonics is important, can also see that there are vastly more important problems.

[ 2 ] – I think that this is a little bit of an oversimplification. I think there are ways that people can contribute usefully in a mode that is close to “executing on what some people they trust think is a good idea”, but you do need a critical mass of people who are clawing for traction themselves, or this doesn’t work. Therefore the regent is people clawing for traction, and capacity to absorb conscientious ability-to-execute talent is limited.

[ 3 ] – The world is really complicated, and I’m not sure how confident to be that our charitable interventions are working. This post by Ben Hoffman pointing out that the expected value distribution for deworming interventions trends into the negative, but most EAs don’t seem aware of this, seems on point.

Further (though this is my personal opinion, more than any kind of consensus), the argument Eliezer makes here is extremely important for anyone taking aim at eradicating poverty. If there is some kind of control system that keeps people poor, regardless of the productivity of society, this suggests that there might be some fundamental blocker to ending poverty: until that control system is addressed somehow, all of the value ostensibly created by global health interventions is being sucked away.

(Admittedly, this argument holds somewhat less force if one is aiming simply to reduce human suffering in the next year, rather than any kind of long term goal like “permanently and robustly end poverty.”

Considerations like that one suggest that we should be much less certain that our favored global poverty interventions are effective. Instead of (or perhaps, in addition to) rushing to move as many resources to those interventions as possible, it seems like the more important thing is to continue trying to puzzle out, via experiments and modeling and whatever tools we have, how the relevant social systems work, to verify that we’re actually having the positive effects that we’re aiming for. It seems to me that even in the domains of global poverty, we still need a good deal of exploration relative to exploitation of the interventions we’ve uncovered.

Relatedly it seems to me that focusing on charity is somewhat myopic: it is true that there is a machine eradicating poverty, but that machine is called capitalism, not charity donation. Maybe the charity donations help, but I would guess that if you want to really have the most impact here, the actual thing to do is not give to charities but something more sophisticated that engages more with that machine. (I might be wrong about that. Maybe in fact global health interventions are, actually the best way to unblock the economic engine so that capitalism can pull the third world out of poverty faster).

	Julian R. on Notes on the Caplan-Bruenig…
	elityre on Some notes on my recent, sudde…
	elityre on Humans are an evil god-sp…
	Mike Robinson on Humans are an evil god-sp…
	habaloo on When does anarcho-capitalism f…

Month: April 2021

On category I and category II problems