There’s a key question that should govern the way we engage with the world: “Which things have increasing marginal returns and which things have decreasing marginal returns.”
Sometimes people are like “I’ll compromise between what I like doing and what has impact, finding something that scores pretty good on both.” Or they’ll say, “I was already planning to get a PhD in [field]/ run a camp for high schoolers / do personal development training / etc. I’ll make this more EA on the margin.”
They are doomed. There’s a logistic success curve, and there are orders of magnitude differences in the impact of different interventions. Which problem you work on is by far the most important determinant of your impact on the world, since most things don’t matter very much at all. The difference between the best project you can find and some pretty good project, is often so large as to swamp every other choice that you make. And within a project, there are a bunch of choices to be made, which themselves can make orders of magnitude of difference, often the difference between “this is an amazing project and this is basically worthless.”
On the other hand, sometimes people push themselves extra hard to work one more hour at the end of the day when they’re tired and flagging. Often, but not always, they are pwnd. Your last hour of the day is not your best hour of the day. Very likely, it’s your worst hour. For most of the work that we have to do, higher quality hours are superlinearly more valuable than lower quality hours. Slowly burning yourself out, or limiting your rest (which might curtail your highest quality hours tomorrow), so that you can eke out a one more low quality hour of work, is a bad trade. You would be better off not worrying that much about it, and you definitely shouldn’t be taking on big costs for “optimizing your productivity” if “optimizing your productivity” is mainly about getting in increasingly low marginal value work hours.
Some variables have increasing marginal returns. We need to identify those, so that we can aggressively optimize as hard as we can on those, including making hard sacrifices to climb a little higher on the ladder.
Some variables have decreasing marginal returns. We want to identify those so that we can sacrifice on them, and otherwise not spend much attention or effort on them.
Getting which is which right, is basically crucial. Messing up in either direction can leak most of the potential value. Given that, it seems like more people attempting to be ambitiously altruistic should be spending more cognitive effort, trying to get this question right.
Please note that while I make claims about what I understand other people to believe, I don’t speak for them, and might be mistaken. I don’t represent for MIRI or CFAR, and they might disagree. The opinions expressed are my own.
Someone asked me why I am broadly “pessimistic”. This post, which is an articulation of an important part of my world view, came out of my answer to that question.
Two kinds of problems:
When I’m thinking about world scale problems, I think it makes sense to break them down into two categories. These categories are nebulous (like all categories in the real world), but I also think that these are natural clusters, and most world scale problems will clearly be in one or the other.
Category I Problems
Let’s say I ask myself “Will factory farming have been eliminated 100 years from now?” (Setting aside x-risk for a moment) I can give a pretty confident “Yes.”
It looks to me that we have a pretty clear understanding of what needs to happen to eliminate factory farming. Basically, we need some form of synthetic meat (and other animal products) that is comparable to real meat in taste and nutrition, and that costs less than real meat to produce. Once you have that, most people will prefer synthetic meat to real meat, demand for animal products will drop off markedly, and the incentive for large scale factory farming will vanish. At that point the problem of factory farming is basically solved.
Beyond and Impossible meat is already pretty good, maybe even good enough to satisfy the typical consumer. That gives us proof of concept. So eliminating factory farming it is almost just a matter of making the cost of synthetic meat < the cost of real meat.
There’s an ecosystem of companies (notably Impossible and Beyond, plus some cultured meat startups) that are iteratively making progress on exactly the problem of driving down the cost. Possibly, it will take longer than I expect, but unless something surprising happens, that ecosystem will succeed eventually.
In my terminology, I would say that there exists a progress–machine that is systematically ratcheting towards a solution to factory farming. There is a system that is making incremental progress towards the goal, month by month, year by year. If things continue as they are, and there aren’t any yet-unseen fundamental blockers, I have strong reason to think that the problem will be fully solved eventually.
Now, this doesn’t mean that the problem is already solved, figuring out ways to make this machine succeed marginally faster is still really high value, because approximately a billion life-years are spent in torturous conditions in factory farms every year. If you can spend your career speeding up the machine by six months, that prevents at least a half a billion life-years of torture (more than that if you shift the rate of progress instead of just translating the curve to the left).
Factory farming is an example of what (for lack of a better term), I will call a “category I” problem. [tentative name “turn-crank problems”]
Category II Problems
In contrast, suppose I ask “Will war have been eliminated 100 years from now?” To this question, my answer has to be “I don’t know, but probably not?”
That’s not because I think that ending war is impossible. I’m pretty confident that there exists some set of knowledge (of institution design? Of game theory? Of diplomacy?) with which we could construct a system that robustly avoided war, forever.
But in my current epistemic state, I don’t even have a sketch of how to do it. There isn’t a well specified target that if we hit it, we’ll have achieved victory (in the way that “cost of synthetic meat < cost of real meat” is a victory criterion). And there is no machine that is systematically ratcheting towards progress on eliminating war.
That isn’t to say that there aren’t people working on all kinds of projects which are trying to help with the problem of “war”, or reducing the incidence of war (peace talks, education, better international communication, what have you). There are many people, in some sense, working hard at the problem.
But their efforts don’t cohere into a mechanism that reliably produces incremental progress towards solving the problem. Or said differently, there are things that people can do that help with the problem on the margin, but those marginal improvements don’t “add up to” a full solution. It isn’t the case that if we do enough peace talks and enough education and enough translation software, war will be permanently solved.
In a very important sense, humanity does not have traction on the problem of ending war, in a way that it does have traction on the problem of ending factory farming.
Ending war isn’t impossible in principle, but there are currently no levers that we can pull to make systematic progress towards a solution. We’re stuck doing things that might help a little on the margin, but we don’t know how to set up a system such that if that machine runs long enough, war will be permanently solved.
I’m going to call problems like this, where there does not exist a machine that is making systematic progress, a “category II” problem. [tentative name “non-turn-crank problems”]
Category I vs. Category II
Some category I problems:
“Getting off the rock” (but possibly only because Elon Musk was born, and took personal responsibility for it)
Getting a computer to almost every human on earth
Legalizing / normalizing gay marriage
Possibly solving aging?? (and if so, probably only because a few people in the transhumanism crowd took personal responsibility for it)
Some Category II problems
Civilizational sanity / institutional decision making (on the timescale of the next century)
Achieving stable, widespread mental health
If you can draw a graph that shows the problem more-or-less steadily getting better over time, it’s a category I problem. If progress is being made in some sense, but progress happens in stochastic jumps, or it’s non-obvious how much progress was made in a particular period, it’s a category II problem.
In order for factory farming to not be solved, something surprising, from outside of my current model, needs to happen. (Like maybe technological progress stopping, or a religious movement that shifts the direction of society.)
Whereas in order for war to be solved, something surprising needs to happen. Namely, there needs to be some breakthrough, a fundamental change in our understanding of the problem, that gives humanity traction on the problem, and enables the construction of a machine that can make systematic, incremental progress.
(Occasionally, a problem will be in an inbetween category, where there doesn’t yet exist a machine that is making reliable progress on the problem, but that isn’t because our understanding of the shape of the problem is insufficient. Sometimes the only reason why there isn’t a machine doing this work is only because no person or group, of sufficient competence, has taken heroic responsibility for getting a machine started.
For instance, I would guess that our civilization is capable of making steady, incremental progress on the effectiveness of cryonics, up to the point cryonics being a reliable, functional, technology. But progress on “better cryonics” is mostly stagnant. I think that the only reason there isn’t a machine incrementally pushing on making cryonics work is that no one (or at least no one of sufficient skill) has taken it upon themselves to solve that problem, thereby jumpstarting a machine that makes steady incremental progress on it. [ 1 ]
It is good to keep in mind that these category 1.5 problems exist, but they mostly don’t bear on the rest of this analysis.)
Maybe the most important thing: Category I problems get solved as a matter of course. Category II problems get solved after we stumble into a breakthrough that turns them into category I problems.
Where in this context, a “breakthrough” is when “some change either in our understanding (the map) or in the world (the territory) that causes the shape of the problem to shift, such that humanity can now make reliable systematic progress towards a solution, unless something surprising happens.”
Properties of Category I vs. Category II problems
Category I problems
Category II problems
There exists a “machine” that is making systematic, reliable progress on the problem
There isn’t a “machine” making systematic, reliable progress on the problem, and humanity doesn’t yet know how to make such a machine
Marginal improvements can “add up” to a full solution to the problem
Marginal improvements don’t “add up” to a full solution to the problem
The problem will be solved, unless something surprising happens
The problem won’t be solved until something surprising happens
We’re mostly not depending on luck
We’re substantially depending on luck
Progress is usually incremental
Progress is usually stochastic
Progress is pretty “concrete”; it is relatively unlikely that some promising project will turn out to be completely useless
Progress is “speculative”; it is a live possibility that any given pieces of work that we consider progress will later prove completely useless in light of better understanding
The bottleneck to solving the problem is the machine going better or faster
The bottleneck to solving the problem is a breakthrough, defined as “some shift (either in our map or in the territory) that changes the shape of the problem enough that we can make reliable systematic progress on it”
Solving the problem does not require any major conceptual or ontological shifts; progress consists of many, constrained, engineering problems
Our understanding of the problem, or ontology of the problem, will change at least once, but most likely many times, on the path to a full solution
There might be graphs that show the problem getting better, more-or-less steadily, over time.
It’s quite hard to assess how much progress is made in a given unit of time, or even if exciting “milestones” actually constitute progress
We know how to train people to fill roles in which they can reliably contribute to progress on the problem, mostly what is needed is effective people to fill those roles
We have only shaky and tenuous knowledge of how to train people to make progress on the problem; mostly what’s needed is people who can figure out for themselves how to get orientation for themselves
If all the relevant inputs were free, the problem would be solved or very close to solved.
(eg with a perpetual motion machine that produced arbitrarily large amounts of the ingredients to impossible meat, factory farming would be over)
If the inputs were free, this would not solve the problem
(eg with a hypercomputer, AI safety would not be solved)
Properties of Category I vs Category II problems
Luck and Deadlines
In general, I’m optimistic about category I problems, and pessimistic about category II problems (at least in the short term).
And crucially, humanity doesn’t know how to systematically make progress on intentionally turning a given category II problem into a category I problem.
We’re not hopeless at it. It is not completely a random walk. Individual geniuses have sometimes applied themselves to a problem that we were so confused about as to not have traction on it, and produced a breakthrough that gives us traction. For instance, Newton giving birth to mathematicized physics. [Note: I’m not sure that this characterization is historically correct.]
But overall, when those breakthroughs occur, it tends to be in large part due to chance. We mostly don’t know how to make breakthroughs, on particular category II problems, on demand.
Which is to say, any given problem transitioning from II to I depends on luck.
And unfortunately, it seems like some of the category II problems I listed above 1) are crucial to the survival of civilization, and 2) have deadlines.
It looks like, from my epistemic vantage point, that if we don’t solve some subset of those problems before some unknown deadline (possibly as soon as this century), we die. That’s it. Game over.
Human survival depends on solving some problems for which we currently have no levers. There is nothing that we can push on to make reliable, systematic progress. And there’s no machine making reliable, systematic progress.
Absent a machine that is systematically ratcheting forward progress on the problem, there’s no strong reason to think that it will be solved.
Or to state the implicit claim more explicitly:
Large scale problems are solved only when there is a machine incrementally moving towards a solution.
There are a handful of large scale problems that seem crucial to solve in the medium term.
There aren’t machines incrementally moving towards solutions to those problems.
So by default, unless something changes, I expect that those problems won’t be solved.
On AI alignment
There are people who dispute that AI risk is a category II problem, and they are accordingly more optimistic. I believe that Rohin Shah and Paul Christiano both think that there’s a pretty good chance that business-as-usual AI development will solve alignment as a matter of course, because alignment problems are on the critical path to making functioning AI.
That is, they think that there is an existing machine that is solving the problem: the AI/ML field itself.
If I understand them correctly, they both think that there is a decent chance that their EA-focused alignment work will have been counterfactually irrelevant in retrospect, but it still seems like a good altruistic bet to lay groundwork for alignment research now.
In the terms of my ontology here, they see themselves as shoring up the alignment-progress machine, or helping it along with some differential progress, just in case the machine turns out to have been inadequate to the task of solving the problem before the deadline. Even though they think that there is a substantial chance that their work will, in retrospect, turn out to have been counterfactually irrelevant, because getting the AI alignment problem right seems really high leverage for the value of the future, it is a good altruistic bet to do work that makes it more likely that the machine will succeed, on the margin.
This is in marked contrast to how I imagine the MIRI leadership is orienting to the problem: When they look at the world, it looks to them like there isn’t a machine ratcheting towards safety at all. Yes, there are some alignment-like problems that will be solved in the course of AI development, but largely by patches that invite nearest-unblocked strategy problems, and which won’t generalize to extremely powerful systems. As such, MIRI is [edit 2023: was] making a desperate effort to make or to be a machine that ratchets toward progress on safety.
I think this question of “is there a machine that is ratcheting the world towards more AI safety”, is one of the main cruxes between the non-MIRI optimists, and the MIRI-pessimists, which is often overshadowed by the related, but less crucial question of “how sharp will takeoff be?”
Over the past 3 years, I have regularly taught at AIRCS workshops. These are mainly a recruitment vehicle for MIRI, run as a collaboration between MIRI and CFAR.
At AIRCS workshops, one thing that we say early on is that AI safety is a “Preparadigmatic field”, which is more or less the same as saying that AI alignment is a category II problem. AI safety as a field hasn’t matured to the point that there are known levers for making progress.
And we, explain, we’re going to teach some rationality techniques at the workshop, because those are supposed to help one orient to a paradigmatic field.
Some people are skeptical that these funny rationality methods are helpful at all (which, to be clear, I think is a quite reasonable position). Other people give the opposite critique, “it seems like clear thinking and effective communication and generally making use of all your mind’s abilities, is useful in all fields, not just preparadigmatic ones.”
But this is missing the point slightly. It isn’t so much that these tools are particularly helpful for prepardigmatic fields, it’s that in preparadigmatic fields, this is the best we can provide.
More developed fields have standard methods that are known to be useful. We train aspiring physicists in calculus, because we have ample evidence that calculus is an extremely versatile tool for making progress on physics, for example. [another example would be helpful here]
We don’t have anything like that for AI safety. There are not yet standard tools in the AI safety toolbox that we know to be useful and that everyone should learn. We don’t have that much traction on the problems.
So as a backstop, we teach very general principles of thinking and problem solving, as a sort of “on ramp” to thinking about your own thinking and how you might improve your own process. The hope is that will translate into skill in getting traction on a confusing domain that doesn’t yet have standard methods.
When you’re flailing, and you don’t have any kind of formula for making research progress, it can make sense to go meta and think about how to think about how to solve problems. But if you can just make progress on the object level, you’re almost certainly better off doing that.
People sometimes show up at AIRCS workshops expecting us to give them concrete technical problems that they can try and solve, and are sometimes discouraged that instead we’re teaching these woo-y or speculative psychological techniques.
But, by and large, we DON’T have concrete, well-specified, technical problems to solve (note that this is a somewhat contentious claim, see the bit about AI safety above). The work that needs to be done is something like “wandering around in one’s confusion in such a way that one can crystalize well specified technical problems.” And how to do that is very personal and idiosyncratic: we don’t have systematized methods for doing that, such that someone can just follow the instructions and get the desired result. But we’ve found that the woo-y tools seem to give people new levers and new perspectives for figuring out how to do this for themselves, so that’s what we have to share.
As a side note: I have a gripe that “rationality” has come to conflate two things, there’s “how to make progress on natural philosophy when you don’t have traction on the problem” and separately, there’s “effective decision-making in the real world”. These two things have some overlap, but they are really pretty different things. And I think that development on both of them has been hurt by lumping them under one label.
If I were to offer a critique of Effective Altruism it would be this: EA in general doesn’t distinguish between category I and category II problems.
Of course, this doesn’t apply to every person who is affiliated with EA. But many EAs, and especially EA movement builders, implicitly think of all problems as class I problems. That is, they are implicitly behaving as if there exists a machine that will convert resources (talent, money, attention) into progress on important problems.
And, as I said, there are many problems for which there does exist a machine doing that. But in cases where there isn’t such a machine, because the crucial breakthrough that would turn the problem from category II to category I hasn’t occurred yet, this is counterproductive.
The sort of inputs that allow a category I problem-solving machine to go better or faster, are just very different from the sort of inputs that make it more likely that humanity will get traction on a category II problem.
Ease of Absorbing Talent
For one thing, more people is often helpful for solving a category I problem, but is usually not helpful for getting traction on a category II problem. Machines solving category I problems can typically absorb people, because (by dint of having traction), they are able to train people to fill useful roles in the machine.
Folks trying to get traction on a category II problem, by definition, don’t have systematic methods by which they can make progress. So they can’t easily train people to do known-to-be-useful work.
I think there are clusters that are able to make non-0 progress on getting traction, and that those clusters can sometimes usefully absorb people, but they basically need to be people that have a non-trivial ability to get traction themselves. Because the work to be done is trying to get traction on the problem, it doesn’t help much to have more people who are waiting to be told what to do: the thing that they need to do is figure out what to do. [ 2 ]
Benefit of Marginal Improvements
For another thing, because machines solving category I problems can generally absorb resources in a useful way, and because they are making incremental progress, it can be useful to nudge people’s actions in the direction of a good thing, without them shifting all the way to doing the “optimal” thing.
Maybe someone won’t go vegan, but they might go vegetarian. Or maybe a company won’t go vegetarian, but it can be persuaded to use humanely farmed eggs.
Maybe this person won’t change their whole career-choice, but they would be open to choosing more impact oriented projects within their career.
Maybe most people won’t become hard-core EAs, but if many people change their donations to be more effective on the margin, that seems like a win.
Maybe an AI development company won’t hold back the deployment of their AI system for years, and find a way to insure that it is aligned, but it can be convinced to hire a safety team.
For category I problems interventions on the margin “add up to” significant progress on the problem. What a category I problem means is that there are at least some forms of marginal improvement that, in aggregate, solve the problem.
But in the domain of category II problems, marginal nudges like this are close to useless.
Because there is not a machine, to which people can contribute, that will incrementally make progress on the problem, getting people to be somewhat more aware of the problem, or care a little about the problem, doesn’t do anything meaningful.
In the domain of a category II problem, the thing that is needed is a breakthrough (or a series of breakthroughs) that will turn it into a category I problem.
I don’t know how to make this happen in full generality, but it looks a lot closer to a small number of highly talented, highly-invested people who are working obsessively on the problem than it looks like a large mass of people who are aware that the problem is important and will make marginal changes to their lives to help.
A machine learning researcher who is not interested in really engaging with the hard part of the problem of AI safety, because that would require backchaining from bizarre-seeming science-fiction scenarios, but is working on a career-friendly paper that he has rationalized, by way of some hand-wavy logic as, “maybe being relevant to AI safety someday”, is, it seems to me, quite unlikely to uncover some crucial insight that leads to a breakthrough on the problem.
Even a researcher who is sincerely trying to help with AI safety, whose impact model is “I will get a PhD, and then publish papers about AI safety” is, according to me, extremely unlikely to produce a breakthrough. Such a person is acting as if there is a role that they can fill, and if they execute well in that role, this will make progress on the problem. They’re acting as if they are part of an existing solution machine.
But, as discussed, that’s not the way it works: we don’t know how to make progress on AI safety, there aren’t straightforward things to do that will reliably make progress. The thing that is needed is people who will take responsibility for independently attempting to figure out how to make progress (which, incidentally, involves tackling the whole problem in its entirety).
If a person is not thinking at all about how to get traction on the core problem, but is just doing “the sort of activities that an AI safety researcher” does, I think it is very unlikely that their activity is going to lead to the kind of groundbreaking work that changes our understanding of the problem enough that AI alignment flips from being in category II to being in category I.
In these category II cases, shifts in behavior, on the margin, do not add up to progress on the problem.
EA’s causal history
Not making a distinction between category I and category II problems, EA as a culture, has a tendency to try and solve all problems as if they are category I problems, namely by recruiting more people and directing them at AI alignment or AI policy or whatever.
This isn’t that surprising to me, because I think a large part of the memetic energy of EA came from the identification of a specific category I problem: There was a claim that a first world person could easily save lives by donating to effective charities focused on health interventions in the third world.
Insofar as this is true, there’s a strong reason to recruit as many people to be engaged with EA as possible: the more people involved, the more money moved, the more lives saved.
However, in the years since EA was founded, the intellectual core of the movement updated in two ways:
Firstly, it now seems more dubious (to me at least) that lives can be cheaply and easily saved that way. (I’m much less confident of this point, and it isn’t a crux for me, so I’ve removed discussion of it to an endnote.[ 3 ] )
And more importantly, EA realized that there are vastly more important problems than global poverty.
X-risk has been a thread in EA discourse from the beginning: one of the three main intellectual origins of EA was LessWrong (the other two being GiveWell coming from the finance world, and Giving What We Can / 80,000 hours stemming from some Oxford philosophers). But sometime around 2014 the leadership of EA settled on long-termism and x-risk as the “most important thing”. (I was part of the volunteer team for EAG 2015, and I saw firsthand that there was an explicit push to prioritize x-risk.)
Over recent years that shift has taken form: deprioritizing earning to give, but promoting careers in AI policy, for instance.
I claim that this pivot represents a more fundamental shift than most in EA realize. Namely, a shift from EA being the sort of thing that is attempting to make progress on a category I problem to EA being the sort of thing attempting to make progress on a category II problem.
EA developed as a movement for making progress on a category I problem: It had as a premise that ordinary people can do a lot of good by moving money to (mostly) pre-existing charities, and by choosing high impact “careers” (where a “career” implies an already-mapped out trajectory). Category I orientation is implicit in EA’s DNA.
And so when the movement tries to make the pivot to focusing on x-risk, it implicitly orients to that problem as if it were a category I problem. “Where can we direct money and talent, to make impact on x-risk”.
For all of the reasons above, I think this is a fundamental error: the inputs that lead to progress on a category I problem are categorically different than those that lead to progress on a category II problem.
To state my view starkly, if somewhat uncharitably, EA is essentially, shoveling resources into a non-existent x-risk progress machine, imagining that if they shovel enough, this will lead to progress on x-risk, and the other core problems of the world. As I have said, I think that there isn’t yet a machine that can make consistent incremental progress in that way.
But it would be pretty hard, I think, for EA to do something different. This isn’t a trivial error to correct: changing this would entail changing the fundamental nature of what EA is and how it orients to the world.
[ 1 ] – I’ve heard Nate Sores refer to “the curse of Cryonics” which is that anyone who has enough clear thinking independent thought to realize that cryonics is important, can also see that there are vastly more important problems.
[ 2 ] – I think that this is a little bit of an oversimplification. I think there are ways that people can contribute usefully in a mode that is close to “executing on what some people they trust think is a good idea”, but you do need a critical mass of people who are clawing for traction themselves, or this doesn’t work. Therefore the regent is people clawing for traction, and capacity to absorb conscientious ability-to-execute talent is limited.
[ 3 ] – The world is really complicated, and I’m not sure how confident to be that our charitable interventions are working. This post by Ben Hoffman pointing out that the expected value distribution for deworming interventions trends into the negative, but most EAs don’t seem aware of this, seems on point.
Further (though this is my personal opinion, more than any kind of consensus), the argument Eliezer makes here is extremely important for anyone taking aim at eradicating poverty. If there is some kind of control system that keeps people poor, regardless of the productivity of society, this suggests that there might be some fundamental blocker to ending poverty: until that control system is addressed somehow, all of the value ostensibly created by global health interventions is being sucked away.
(Admittedly, this argument holds somewhat less force if one is aiming simply to reduce human suffering in the next year, rather than any kind of long term goal like “permanently and robustly end poverty.”
Considerations like that one suggest that we should be much less certain that our favored global poverty interventions are effective. Instead of (or perhaps, in addition to) rushing to move as many resources to those interventions as possible, it seems like the more important thing is to continue trying to puzzle out, via experiments and modeling and whatever tools we have, how the relevant social systems work, to verify that we’re actually having the positive effects that we’re aiming for. It seems to me that even in the domains of global poverty, we still need a good deal of exploration relative to exploitation of the interventions we’ve uncovered.
Relatedly it seems to me that focusing on charity is somewhat myopic: it is true that there is a machine eradicating poverty, but that machine is called capitalism, not charity donation. Maybe the charity donations help, but I would guess that if you want to really have the most impact here, the actual thing to do is not give to charities but something more sophisticated that engages more with that machine. (I might be wrong about that. Maybe in fact global health interventions are, actually the best way to unblock the economic engine so that capitalism can pull the third world out of poverty faster).
[Draft. This post really has a lot of prerequisites, that I’m not going to bother trying to explain. I’m just writing it to get it out of me. I’ll have to come back and make it understandable later, if that seems worth doing. This is really not edited.]
We live in an inadequate world. Things are kind of a mess. The vast majority of human resources are squandered, by Moloch, on ends that we would not reflectively endorse. And we’re probably all going to die.
The reason the world is so messed up, can be traced back to a handful of fundamental problems or fundemental constraints. By “fundamental problem” I have something pretty specific in mind, but Inadquite Equlibira points in the right direction. They’re the deep reasons why we can’t “just fix” the worlds problems.
Some possible fundamental problems / constraints, that I haven’t done the work to formulate correctly:
The wold is too big and fast for any one person to know all of the important pieces.
The game theoretic constraints that make rulers act against the common good.
People in power take power preserving actions, so bureaucracy resist change, including correct change.
People really want to associate with prestigious people, and make decisions on that basis.
We can’t figure out what’s true anywhere near efficiently enough.
People can’t actually communicate about the important things.
We don’t know how, even in principle, to build an aligned AGI.
Molochian race dynamics.
Everyone is competing to get information to the people with power, and the people in power don’t know enough to know who to trust.
We’re not smart enough.
There is no system that is keeping track of the wilderness between problems.
I recently had the thought that some of these problems have different characters than the others. They fall into two camps, which, of course, actually form a spectrum.
For some of these problems, if you solved them, the solution would be self-aligning.
By that I mean something like, for some of these problems, their solutions would be a pressure or force, that would push towards solving the other problems. In the best case, if you successfully solved that problem, in due course this would case all of the other problems to automatically get solved. The flow-through effects of such a solution are structurally positive.
For other problems, even though the represent a fundamental constraint, if they were solved they wouldn’t push towards the solving of the other problems. In fact, solving that one fundamental problem in isolation might make the other problems worse.
A prototypical case of a problem who’s solution is self-aligning [I need to come up with better terminology] is an Aligned AI. If we knew how to build an AI that could do what we actually want, this would perhaps automatically solve all of our other problems. It could tell us how (if not fix the problems itself) to have robust science, or optimal economic policy, or incentive-aligned leaders, or whatever.
Aligned AI is the lolapaluza of altruistic interventions. We can solve everything in one sweep. (Except of course, the problems that were prerequisites for solving aligned AI. Those we can’t count on the AI to solve for us.)
Another example: If we implemented robust systems that incentivized leaders to act in the interests of the public good, it seems like this has the potential of (eventually) breaking all of the other problems. It would be a jolt that knocks our civilization into the attractor basin of a sane, adequate civilization (if our civilization is not in that attractor basin already).
In contrast, researcher ability is a fundamental constraint of our civilization (though maybe not a fundemental problem?), but it is not obvious that the flow through effects of breaking through that fundamental constraint are structurally positive. On the face of it, it seems like it would be bad if everyone in the world decoupled their research acumen: that seems like it would speed us toward doom.
This gives a macros-strategic suggestion, and a possible solution to the last term problem: identify all of the fundamental problems that you can, determine which ones have self-aligning solutions, and dedicate your life to solving whichever problem has the best ratio of tractability to size of (self-aligned) impact.
I maybe reinventing symmetric vs. asymmetric weapons here, but I think I am actually pointing at something deeper, or at least extending the idea further.
[Edit / note to self: I could maybe explain this with reference to personal productivity?: you want to find the thing which is easy to do but most makes it easy to do the other things. I’m not sure this captures the key thing I want to convey.]
[Note: I learned this concept directly from John Salvatier. All credit for the ideas goes to him. All blame for the incoherence of this post goes to me.]
This post doesn’t have a payoff. It’s just laying out some ideas.
Some actions are “controlled”, which is to say their consequences are very precisely determined by the actor.
The term is in reference to, for instance, a controlled demolition. A controlled demolition occurs when a building collapses in a specific pattern, compared to an uncontrolled demolition, which would just be knocking over a building, without any particular concern for how or where the pieces go.
The following are some axis that influence how controlled an action is.
How precisely predictable the effects of the action are
Rocket launches are highly controlled, in that the one can precisely predict the trajectory of the rocket. Successfully changing the social norms around dating, sex, and marriage (or anything really) is uncontrolled because human society is a complicated knot of causal influences, and it is very hard to know in advance what the down-stream impacts will be.
(In general, actions that involve physical deterministic systems are more controlled than actions that involve human minds.)
How reversible the results of an action are
But you don’t need to be able to predict the results of your actions, to have controlled actions, if your actions are reversible.
Dynamiting a mountain (even via a controlled demolition), is less controlled than cutting down a forest, which is less controlled than turning on a light.
How much you “own” the results of your actions
Inventing and then open-sourcing a new technology is uncontrolled. Developing proprietary software is more controlled, because you have more ability to dictate how the software is used (though the possibility of copycats creating can create similar software mitigates your control). Developing software that is only used within one’s own organization is more controlled still.
Processes that are self perpetuating or which take on a life of their own (for instance, sharing an infectious idea, which then spreads and mutates) are extremely uncontrolled.
How large or small the step-size of the action is and how frequent the feedback is
It is more controlled to cut down a tree at a time, and check the ecological impact after each felling, than it is to only check the ecological impact after the whole forest has been removed. Careful gradual change is more controlled.
(Unfortunately, many actions have different effects at large scales than at small scales, and so one doesn’t get information about their impacts until the action is mostly completed.)
In general, there’s a pretty strong tradeoff between the effect sizes of one’s actions, and how controlled they can be. It’s easy to keep many small actions controlled, and nigh-impossible to keep many large actions controlled.
Problems requiring high control
Some problems inherently require high control solutions. Most construction projects are high control problems, for instance. Building a sky scraper depends on hundreds of high precision steps, with the later steps depending on the earlier one. Building a watch is a similarly high control problem.
In contrast, there are some problems for which low control solutions are good enough. In particular, when only a single variable of the system being optimized needs to be modified, low control solutions that move that variable (in the right direction), are sufficient.
For instance, removing lead from the environment is a moderately low control action (hard to reverse, hard to predict all the downstream consequences, the actor doesn’t own the effects) but it turns out that adjusting that one variable is very good move. (Probably. The world is actually more confusing than that.)