Some thoughts on Effective Altruism – Two memes

Perhaps the most important thing to understand about EA is that, from the beginning, it was composed of two entwined memes.

Charity is inefficient

The first of these memes was that by using reason and evidence you could do much more good, with charity, than the default

In the early 2010s, there were some nerds on the internet (LessWrong and GiveWell and some other places) writing about, among other things, how to optimize charitable giving.

It seemed like the charity sector, in general, was not very efficient. With careful thought and research you could easily outcompete the charity sector as a whole, making investments that are orders of magnitude more effective than donating with a typical amount of thought.

The basic claim here is about competence: by being more thoughtful and rational, by doing research, we can outcompete a charitable industry that isn’t trying very hard, at least by our standards.

Normal people can save lives

The second meme was that a person with relatively small life-style changes, a normal person in the first world could do a shocking level of good, and it is good for people to do that.

My guess is that this meme started with Peter Singer’s drowning child paper, which argued that by spending tiny amounts of money and giving up some small creature comforts, one could literally save lives in the third world. And given that, it’s really really good when people decide to live that way; the more people who do, the more lives are saved. The more resources that are committed to these problems more good we can do.

This basic meme became the seed of Giving What We Can, for instance.

Note that in its initial form, this was something like a redistributive argument: We have so much more stuff than others who are in dire need that there’s a moral pressure (if not an obligation) to push for wealth transfers that help address the difference.

To summarize the difference between these ideas: One of these is about using the altruistic budget more effectively, and the other is about expanding the size of the altruistic budget.

Synergies

These memes are obviously related. They’re both claims about the outsized impact one can have via charitable donation (and career prioritization).

The claim that a first-worlder can save lives relatively cheaply means that there’s not an efficient market in saving lives.

And in my experience at least, the kind of analytical person that is inclined to think “One charity is going to be the most effective one. Which one is that?” also tends to be the kind of person that thinks, at some point in their life, “There are people in need who are just as real as I am, and the money I’m spending could be spent on helping them. I should give most of my money away.” There’s an “EA type”, and both of these memes are appealing to that type of person.

So in some ways these two memes are synergistic. Indeed, they’re synergistic enough that they fused together into the core of the Effective Altruism movement.

Tensions

But, as it turns out, those memes are also in tension with each other in important respects, in ways that played out in cultural conflicts inside of EA, as it grew.

These two memes encode importantly different worldviews, which have different beliefs about the constraints on doing more good in the world. To make an analogy to AI alignment, one of these views holds that the limiting factor is capability, how much of an influence we you can have on the world, and the other holds that the limiting factor is steering, the epistemology to identify outsized philanthropic opportunities.

Obviously, both of these can be constraints on your operation, and doing the most good will entail finding an optimal point on the pareto frontier of their tradeoff.

Implications of a “resources moved” frame

If the operating model is that the huge philanthropic gains are primarily redistributive, the primary limiting factor is the quantitative amount of resources moved. That tends to imply…

The Effective Altruism community that wants to be a mass-movement 

If the good part of EA is more people deciding to commit more of their resources to effective charities, you want to convert as many people as possible. One high priority is convincing as many people as possible to become EAs.  

Indeed this was the takeaway from Peter Singer’s TED talk in all the way back in 2013: You can do more good by donating to effective charities than you can by being a doctor (a generally regarded way of “helping people”), but you can do even better than that by converting more people to EA.

Branding matters

If you’re wanting to spread that core message, to get more people to donate more of their resources to effective charities, there’s incentive to reify the EA brand, to be the sort of thing that people can join, rather than an ad hoc collection of ideas and bloggers.

And branding is really important. If you want lots of people to join your movement, it matters a lot if the general public perception of your movement is positive.

Implications of a “calling the outsized opportunities frame”

In contrast, if you’re operating on a model where the philanthropic gains are the result of doing better and more careful analysis, the limiting constraint on the project is not (necessarily) getting more material resources to direct, but the epistemic capability to correctly pinpoint good opportunities. This tends to imply…

Unusually high standards of honesty and transparency are paramount

If you’re engaged in the intellectual project of trying to figure out how the world works, and how to figure out which interventions make things better, it is an indispensable feature of your epistemic community that it has strict honesty norms, that are firmly in Simulacrum level 1.  

You need to have expectations for what counts as honest that are closer to those of the scientific community, compared to the standards of marketing. 

We might take for granted that in many facets of the world, people are not ever really trying to be accurate (when making small talk or crafting slogans) and that people and organizations will put their best foot forward, highlighting their success and quietly downplaying failures.

But that normal behavior is counter to a collective epistemic process of putting forward and critiquing ideas, and learning from (failed) attempts to do stuff.

Furthermore, if you have a worldview that holds that the charity sector is incredibly inefficient, you’re apt to ask why that is. And part of the answer is that this kind of normal “covering one’s ass” / “putting forward a good face” behavior kills the accounting that causes charities to be effective in their missions. This background tends to make people more paranoid about these effects in their “let’s try to outcompete the charity sector” community.

“More people” is not an unadulterated good

Adding more people to a conversation does not, in the typical case, make the reasoning of that conversation more rigorous.

A small community of bloggers and intellectuals engaged in an extended conversation, aiming make progress together on some questions about how to most effectively get altruistic gains at scale, doesn’t necessarily benefit from more participants. 

And it definitely doesn’t benefit from the indiscriminate addition of participants. A small number of people will contribute to the extended conversation more than the communication and upfront enculturation cost that each new person imposes. 

These two world views give rise to two impulses in the egregore of EA: the faction that is in favor of professionalism  and the faction in vocal support of epistemics and integrity.

We see this play out everytime someone posts something arguably unseemly on the EA forum, and someone comments, “I think it was bad to post this, it makes EA look bad and has a negative impact in expectation.”

And I think the tension between these impulses goes a long way towards explaining why EA seems so much less promising to me now than it did 5 years ago.
I have more to say here, about how the incentive to do the hard work of rigorous thinking things through and verifying lines of argument is somewhat self-cannibalizing, but I don’t feel like writing that all right now, so I’m shipping this as a post.

Request for parellel conditional-market functionality

In responses to James’ plan for a manifold dating site, I just wrote tho following comment.

I think this needs a new kind of market UI, to setup multiple conditional markets in parallel. I think this would be useful in general, and also the natural way to do romance markets, in particular.

What I want is for a user to be able to create a master (conditional) prompt that includes a blank to be filled in. eg “If I go on a date with ____, will we end up in a relationship 2 months later?” or “If I read ____ physics textbook, will I be impressed with it?” or “Will I think the restaurant ____, is better than the Butchers Son, if I eat there?” The creator of this master question can include resolution details in the description, as always.

Then other users can come and submit specific values of for the blank. In these cases, they suggest people, physics textbooks, or restaurants.

However (and this is the key thing that makes this market different from the existing multiple choice markets), every suggestion becomes it’s own market. Each suggestion gets a price between 100% and 0%, rather than all of the suggestions in total having adding up to a probability of 100%.

After all, it’s totally possible that someone would end up in a relationship with Jenny (if they end up going on a date with Jenny) and end up in a relationship with George (if they go on a date with George). And it’s likely that there there are multiple restaurants that one would like better than the Butcher’s Son. There’s no constraint that all the answers have to sum to 100%.

(There are other existing markets that would make more sense with this format. Aella’s one-night stand market for one, or this one about leading AI labs. It’s pretty common for multiple choice questions to not need to sum to 100% probability, because multiple answers can be correct.)

Currently, you can create a bunch of conditional markets yourself. But that doesn’t work well for romance markets in particular, for two reasons.

1. Most of the value of these markets is in discovery. Are there people who the market thinks that I should go on a date with, who I’ve never met?
2. It is very socially forward to create a market “Will I be in a relationship with [Jenny], if we go on one date?” That means that I reveal that I’m thinking about Jenny enough to make a market, to Jenny, and to all the onlookers, which could be embarrassing to her. It’s important that the pairings are suggested by other people, and mixed in with a bunch of other suggestions, instead of highlighted in a single top-level market. Otherwise it seems like this is pushing someone’s personal life into a public limelight too much.

If this kind of market UI existed, I would immediately create a “If Eli goes on a date with ____, will the be in a relationship 3 months later?”, and a link to my existing dating doc, and a large subsidy (we’d have to think about to allocate subsidies across all the markets in a set).

In fact, if it were possible and legal to do this with real money, I would probably prefer spending $10,000 subsidizing a set of real money prediction markets of this form, compared to would spending $10,000 on a matchmaker. I just expect the market (and especially “the market” when it is composed of people who are one or two degrees removed from people that I might like to date), to be much better at suggesting successful pairings.

A letter to my 20 year old self

If I could send some advice back in time, to myself when I was 20 years old, this is a lot of what I would say. I think almost all of this is very idiosyncratic to me, and the errors that I, personally, am inclined towards. I don’t think that most 20 year olds that are not me should take these points particularly seriously, unless they recognize themselves in it.

[See also First conclusions from reflections on my life]

  1. Order your learning

You want to learn all skills, or at least all the awesome and useful ones. This is completely legitimate. Don’t let anyone tell you that you shouldn’t aim for that (including with words like “specialization” or “comparative advantage”.)

But because of this, every time you encounter something awesome, you respond by planning to make the practice of it part of your life in the short term. This is a mistake. Learning most things will require either intense bouts of focusing on only that one thing for (at least small numbers of) days at a time, or consistent effort over weeks or months. 

If every time you encounter some skill that seems awesome or important, you resolve to learn it, this dilutes your focus, which ends up with you not learning very much at all. Putting a surge of effort into something and then not coming back to it for some weeks is almost a total waste of that effort—you’ll learn almost nothing permanent from that.

The name of the game is efficiency. You should think of it like this:

Your skill and knowledge, at any given time, represents a small volume in a high dimensional space. Ultimately you want to expand in all or almost all directions. There’s no skill that you don’t want, eventually. But the space is very high dimensional and infinite, so trying to learn everything that crosses your path won’t serve you that well. You want to order your learning.

Your goal should be to plot a path, a series of expansions in this high dimensional space, that results in expanding the volume as quickly as possible. Focus on learning the things that will make it easier and faster to continue to expand, along the other dimensions, instead of focusing on whatever seems cool or salient in the moment.

[added:] More specifically, you should be willing to focus on doing one thing at a time (or one main thing, with a one or at most, two side projects). Be willing to take on a project, ideally but not necessarily involving other people, and make it your full time job for at a month. You’ll learn more and make more progress when you’re not dividing your efforts. You won’t loose nearly as much time in the switching costs, because you won’t have to decide what to do next: there will be a clear default. And if you’re focusing on one project at a time, it’s much easier to see if you’re making progress. You’ll be able to tell much faster if you’re spinning your wheels doing something that feels productive, but isn’t actually building anything. Being able to tell that you failed at a timeboxed goal means that you can notice and adapt.

A month might feel like a long time, to put aside all the other things you want to learn, but it’s not very long in the grand scheme of things. There have been many months since I was 20, and I would be stronger now, if I had spent more of them pushing hard on some specific goal, instead of trying to do many good things and scattering my focus.

You want to be a polymath; but the way to polymathy is not trying to do everything all at once: it’s mostly going intensely, on several different things, in sequence.

  1. Learn technical skills

In particular, prioritize technical skills. They’re easier to learn earlier in life, and I wish I had a stronger grounding in them now.

First and foremost, learn to program. Being able to automate processes, and build simple software tools for yourself is a superpower. And it is a really great source of money.

Then, learn calculus, linear algebra, differential equations, microeconomics, statistics, probability theory, machine learning, information theory, and basic physics. [Note that I’ve so far only learned some of these myself, so I am guessing at their utility].

It would be a good use of your time if you dropped everything else and made your only priority in the first quarter of college to do well in IBL calculus. This would be hard, but I think you would make substantial steps towards mathematical maturity if you did that.

In general, don’t bother with anything else in college, except learning technical subjects. I didn’t find much in the way of friends or connections there, and you’ll learn the non-technical stuff fine on your own.

The best way to learn these is to get a tutor, and walk through the material with the tutor on as regular a basis as you can afford.

  1. Prioritize money 

You’re not that interested in money. You feel that you don’t need much in the way of “stuff” to have an awesome life. You’re correct about that. Much more than most of the people around you, you don’t want or need “nice things”. You’re right to devalue that sort of thing. You’ll be inclined to live frugally, and that has served me very well.

However, you’re missing that money can be converted into learning. Having tens or hundreds of thousands of dollars is extraordinarily helpful for learning pretty much anything you care to learn. If nothing else, most subjects can be learned much faster by talking with a tutor. When you have money, if there’s anything you want to learn, you can just hire someone who knows it to teach you how to do it, or to do it with you. This is an overpowered strategy.

It is a priority for you to get to the point that you’re making (or have saved) enough money that you feel comfortable spending hundreds of dollars on a learning project.

Combining 1, 2, 3, the thing that I recommend that you do now is drop almost everything and learn to become a good programmer. Your only goal for the next few months should be 1) to have enough money for rent and food, and 2) to become a good enough programmer that you can get hired for it, as quickly as you can. Possibly the best way to do this is to do a coding boot camp, instead of self-teaching. You should be willing to put aside other cool things that you want to do and learn, for only a couple of months, to do this.

Then get a job as a software engineer. You should be able to earn small hundreds of thousands of dollars a year with a job like that, while still having time to do other stuff you care about in your off hours. If you live frugally, you can work for 2.5 years and come away with a small, but large enough (eg >100k) nest egg for funding all the other skills that you want to learn.

(If you’re still in college, staying to do IBL first, and then focusing on learning programing isn’t a bad idea. It might be harder to get mathematical maturity, in particular, outside of college.)

  1. Make things / always have a deliverable

I’ve gained much much more skill over the course of projects where I was just trying to do something, than from the sum of all my explicit learning projects. Mostly you learn skills as a side effect of doing things. This just works better than explicit learning projects. 

This also means that you end up learning real skills, instead of the skills that seem abstractly useful or cool from the outside, many of which turn out to have not much relevance to real problems. Which is fine; you can pursue things because they’re cool. But very often, what is most useful and relevant are pieces that are too mundane to come to mind, and doing real things reveals them. Don’t trust your abstract model of what elements are useful or relevant or important or powerful, too much. Better to let your learning be shaped to the territory directly, in the course of trying to do specific things.

The best way to learn is to just try to do something that you’re invested in, for other reasons, and learn what you need to know to succeed along the way. Find some software that you wish existed, that you think would be useful to you, and just try and build it. Run a conference. Take some work project that seems interesting and knock it out of the park. 

Try to learn as much as you can this way.

In contrast, I’ve spent a huge amount of time thinking over the years that didn’t create any value at all. If I learned something at the time, I soon forgot it, and it is completely lost ot me now. This is a massive waste. 

So your projects should always have deliverables. Don’t let yourself finish or drop a project, especially a learning project, until you have produced some kind of deliverable. 

A youtube video of yourself explaining some new math concept. A lecture for two friends. Using a therapy technique with a real client.

A blog post jotting down what you learned, or summarizing your thoughts on a domain is the minimum viable deliverable. If nothing else, write a blog post for everything that you spend time on, to capture the the value of your thinking for others, and for yourself later.

Don’t wait to create a full product at the end. Ship early, ship often. Create intermediate deliverables, capturing your intermediate progress, at least once a day. Write / present about your current thoughts and understanding, including your open confusions. (I’ve often gotten more clarity about something in the process of writing up my confusions in a blog post).

The deliverable can be very rough. But it shouldn’t be just your personal notes. If you’re writing a rough blog post, write it as if for an audience beyond yourself. That will force you to clarify your thoughts and clearly articulate the context much more than writing a personal journal entry. In my experience, the blog posts that I write like this are usually more helpful for my future self than the personal journal entries are.

The rule should be that someone other than you, in principle, could get value from it. A blog post or a recorded lecture, that no one reads, but someone could read and find interesting counts. The same thing, but on a private google drive, doesn’t count. (Even better, though, is if you find just one person who actually gets value out of it. Make things that provide value to someone else.)

Relatedly, when you have an idea for a post or an essay, write it up immediately, while the ideas are alive and energizing. If you wait, they’ll go stale and it is often very hard to get them back. There are lots of thoughts and ideas that I’ve had, which are lost forever because I opted to wait a bit on writing them down. This post is itself the result of some thoughts that I had while listening to a podcast, which I made a point to right up while the thoughts were alive in me.

  1. Do the simple thing first

You’re going to have many clever ideas for how to do things better than the default. I absolutely do not want to discourage you in that.

But it will behoove you to start, by doing the mundane, simple thing. Try the default first, then do optimizations and experiments on top of that, and feel free to deviate from the default, when you find something better.

If you have some fancy idea for how to use spaced repetition systems to improve your study efficiency, absolutely try that. But start by doing the simple thing of sitting down, reading the textbook, and doing the exercises, and then apply your fancy idea on top of that.

You want to get a baseline to compare against. And oftentimes, clever tricks are less important than just putting in the hours doing the work, and so you want to make sure to get started doing the work as soon as possible, instead of postponing it until after you’ve developed a clever system. Even if your system is legitimately clever, if the most important thing is doing the hard work, you’ll wish you started earlier.

You’re sometimes going to be more ambitious than the structures around you expect of you. That’s valid. But start with the smaller goals that they offer, and exceed them, instead of trying to exceed them in one fell swoop.

When you were taking Hebrew in high school, you were unimpressed by the standards of the class and held yourself higher than them. For the first assignment, you were to learn the first list of vocabulary words from the book, for the next week. But you felt that you were better than that, and resolved to study all the vocab in the whole book (or at least a lot of it) in that period, instead.

But that was biting off more than you could easily chew, and (if I remember correctly), when you came back the next week, you had not actually mastered the first vocab list. You would have done better to study that list first, and then move on to the rest, even if you were going to study more than was required.

I’ve fallen into this trap more than once. “Optimizing” my “productivity” with a bunch of clever hacks, or ambitious targets, which ultimately mask the fact that my output is underperforming very mundane work habits, for instance. 

You might want to work more and harder than most people, but start by sticking to a regular workday schedule, with a weekend, and then you can adjust it, or work more than that, from there.

Don’t fall into the trap of thinking that the simple thing that everyone else is doing is beneath you, since you’re doing a harder or bigger thing than that. Do the simple thing first, and then do more or better.

I’m sure there’s more to say, but this is what I was pressing on me last night in particular.

The meta-institutional bottleneck

The world is pretty awful, in a number of ways. More importantly, the world is, in an important way, undershooting its potential

Many of our problems are directly downstream of inadequate institutions, which incentivize those problems, or at least fail to incentivize their solutions. For virtually any part of the world that sucks, you can point to poor institution design that is giving rise to sub-adequate outcomes. [ 1 ]

For many of these broken institutions, there exist proposed alternatives. And for many of those alternatives, we have strong analytic arguments that they would improve or outright solve the relevant problems.

  • A Land Value Tax would likely solve many to most of the human problems of inequality (those that are often asserted to be a consequence of “capitalism”, such as many people working bullshit jobs that they hate, for their whole life, and still being too poor to have practical freedom)
  • There’s a laundry list of innovations that would solve the incentives problems of academia that lead to some large fraction of published research being wrong.
  • Prediction markets could give us demonstrably calibrated estimates of most questions of fact, that take into account all available info.
  • Approval voting is one of a number of voting systems that avoid the pitfalls of first-past-the-post elections. But we still use first-past-the-post and so we live with politicians that almost no one likes or supports.

The world is broken. But it largely isn’t for lack of ideas: we (or at least someone) know how to do better. The problem is that we are blocked on implementing those good ideas.

As Yudkosky puts it in Inadequate Equilibria…

Usually when we find trillion-dollar bills lying on the ground in real life, it’s a symptom of (1) a central-command bottleneck that nobody else is allowed to fix, as with the European Central Bank wrecking Europe, or (2) a system with enough moving parts that at least two parts are simultaneously broken, meaning that single actors cannot defy the system. To modify an old aphorism: usually, when things suck, it’s because they suck in a way that’s a Nash equilibrium.

But that doesn’t mean that there’s some fundamental law of nature that those solutions are untenable. If the proposed alternative is any good, it is also a stable equilibrium. If we moved to that world, we would probably stay there.

Or as Robin Hanson says of prediction markets in an interview,

I’d say if you look at the example of cost accounting, you can imagine a world where nobody does cost accounting. You say of your organization, “Let’s do cost accounting here.”

That’s a problem because you’d be heard as saying, “Somebody around here is stealing and we need to find out who.” So that might be discouraged.

In a world where everybody else does cost accounting, you say, “Let’s not do cost accounting here.” That will be heard as saying, “Could we steal and just not talk about it?” which will also seem negative.

Similarly, with prediction markets, you could imagine a world like ours where nobody does them, and then your proposing to do it will send a bad signal. You’re basically saying, “People are bullshitting around here. We need to find out who and get to the truth.”

But in a world where everybody was doing it, it would be similarly hard not to do it. If every project with a deadline had a betting market and you say, “Let’s not have a betting market on our project deadline,” you’d be basically saying, “We’re not going to make the deadline, folks. Can we just set that aside and not even talk about it?”

From a zoomed out view, we can see better incentive schemes that would give us more of what we want, that would allow us to run civilization better. But there’s no straightforward way to transition from our current systems to those better ones. We’re blocked by a combination of collective action problems and powerful stakeholders who benefit from the current equilibrium. So better institutional design remains theoretical.

One step meta

When I look at this problem, it strikes me that this is itself just another institution-design problem. Our norms, incentives, and mechanisms for moving from one equilibrium to another leave a lot to be desired. But, there could exist institutional setups that would solve or preempt the collective action and opposed-stakeholder problems that prevent us from transitioning out of inadequate equilibria. 

This isn’t something that we, as a species, know how to do, in actual practice. But it is something that we could know how to do. And if we did, that would let us build incentive-aligned institutions up to the limit of our theoretical knowledge of which institutions would produce the outcomes that we want. 

And this would, in turn, put the planet on track to adequacy for literally every other problem we face.

I call this the “meta-institutional bottleneck”. I offer it as a contender for the most important problem in the world.

Is this one thing or many?

It may be that this “meta-institutional bottleneck” is really just a name for the set of specific problems of moving from a specific inadequate equilibrium to its corresponding more adequate one. We have many such problems, but they’re different enough that we effectively have to solve each one on its own terms.

My guess is that that’s not true. I posit that there are enough commonalities in the difficulties to solving each inadequate equilibrium, that we can find at least semi-general solutions to those difficulties.

If I imagine someone attempting to intentionally push such equilibrium-switches, over a 60 year career, I imagine that they would learn things that generalize from one problem to the next. Surely there would be plenty of idiosyncratic details to each problem, but I imagine that there would also be common patterns for which one might develop generally applicable approaches.

I have not solved any such problems, at scale, so I’m only speculating. But some possible patterns / approaches that one might discover:

  • You might learn that there are a small number of people in a social network / prestige hierarchy who are disproportionately leveraged in switching the behavior of a whole network. If you can convince those people to change their behavior, the rest of the network will follow.
    • Certainly many of these people will be prestigious individuals that are easy to identify. But it could also be the case that there are high-leverage nodes that are not obviously powerful or prestigious, but who nevertheless end up being trend-setters in the network. There may or may not be an arbitrage opportunity in getting the people with the most leverage relative to their power/prestige to switch, and there may or may not be generalized methods for identifying those arbitrage opportunities.
  • You might learn that there’s almost always a question of how to compensate stakeholders who are beneficiaries of the current system. You want to run a process that allows the people who benefited from the inadequate institution to save face, or even take the credit for the switch over. Perhaps there are generalized mechanisms that have this property, 
  • You might discover the conditions under which dominance insurance contracts can work well, and what their practical limits are.
  • You might find that there are repeatable methods for building alternative institutions in parallel to legacy institutions, which gradually take on more and more of the functions of the legacy institution such that there’s an incentive gradient towards a better equilibrium at every step.

Or maybe nothing like any of these. Perhaps when one tries to solve these problems in practice, they learn that this whole analytic frame of incentives and game theory is less productive than some other model.

But whatever form it ends up taking, it seems like there ought to be knowledge to discover here. I guess that the meta-institutional bottleneck is meaningfully a “thing”, and not just a catch-all term for a bunch of mostly unrelated, mostly idiosyncratic problems.

And to the extent that that’s true, there’s room for a community of entrepreneur-researchers iteratively trying to solve this kind of problem, sharing what works and what doesn’t, and collectively iterating to best practices, in the same way that there are currently known best practices for starting a startup.

Only one hump

It seems likely that the meta-institutional bottleneck is a one time hurdle. If we’re able to grow past our current existing institutions, we may not be in the trap again.

For one thing, many of the better institutions that we want to install have built in mechanisms for iterating to even better institutions. If your society is regularly utilizing prediction markets to aggregate information, it becomes much easier to surface, validate, and create common knowledge of superior institution design. A society that is already using prediction and decision markets is much much better positioned to switch to something even better, if something like that is found. Our institutions are our meta-institutions.

But even aside from that, if we do discover general, repeatable, methods for moving from inadequate to adequate equilibria, we will forever be in a better situation for improving our institutions over the future. [ 2 ]

For instance, suppose that Balaji’s “network state” model proves successful, and in 25 years there are a handful of diplomatically recognized sovereign states that were founded in that way. This would probably be a great improvement on the object level, since I expect at least one of those states to incorporate modern institution design, demonstrating their effectiveness (or falsifying their effectiveness!) to the rest of the world. But even if that’s not the case, there would, in that future, be a known playbook by which groups of people could iteratively found sovereign states that conform to whatever ideals a large enough coalition buys into. 

We have a meta-institutional bottleneck in this generation, but if we solve it, in general, we may solve it forever. 

The long game

I guess that 60 to 100 years is enough time for a relatively small group of ~100 competent founder types to radically shift the incentives in society. 

This would be a slog, especially at first. But few individuals plan and build on decades-long time horizons. There’s an advantage to those that are playing a long game. 

And as some of these projects succeed, that would accelerate others. As noted above, there’s a flywheel effect where adequate information propagation and decision institutions disproportionately favor other adequate information propagation and decision institutions.

I could imagine a community of people who share this ideology. They would identify specific inadequate equilibria that could maybe be corrected with some combination of a clever plan and an annoying schlep. Individual founders would form small teams to execute a plan to try to solve that equilibrium. 

Some examples of the sorts of projects that we might push on:

  • Normalizing prediction markets
  • Attempting to found network states
  • Reforming existing government processes.
  • Reforming academia 
  • Building expertise identification and peer-review processes outside of academia
  • Building platforms that recreate the functionality of local governance (eg a community governance platform to evaluate and cast judgement on allegations of sexual harassment)
  • Build incentive-compatible outcome-tracking and ranking platforms (for medical professionals, for lawyers, etc.)
  • Generally doing the schlep to kickstarter every change that looks like a collective action problem.

Those teams would often fail, and write up post-mortems for the community to engage with. And occasionally they would succeed, to the point of pushing society over a threshold into some new stable equilibrium. In the process, they would find processes and build software tools for solving a specific problem, and then share them with the broader community, to adapt them to other problems.

Over time we’ll get a clearer sense of what common blockers to these institutional shifts are, and then figure out approaches to those blockers. For instance, as Robin Hanson has found, there isn’t actually demand for accurate information from prediction markets, because accurate information interferes with narrative-setting. And in general, most individuals don’t locally prioritize better outcomes over signaling their good qualities, doing the “done thing”, and avoiding blame. 

Noticing these blockers, we treat them as constraints on our exploration process. We need to find institutional reform paths that are themselves compatible with local incentives, just as a technologist steers towards a long term vision by tacking close to what is currently technologically feasible. And it turns out that the local incentives are different than what one might have naively assumed. Metamed’s failure and Robin’s non-traction on prediction markets both represent progress.  [ 3 ]

We would be playing a decades-long game, which means that a crucial consideration would be building and maintaining a lineage that retains essential skill and strategic orientation, over several generations.

Over the course of a century, I think we could plausibly get to a society on planet earth whose core institutions are basically adequate. And from there, the world would be mostly optimized.

But…AI

The problem, of course, is that it doesn’t seem that we have even 30 years until the world is radically transformed and control of the future slips out of human hands. (Because of AI.)

This is what I would be spending my life on, if not for the rapidly approaching singularity.

I would prefer to be pushing on the meta-institutional bottleneck. It seems more rewarding, and more tractable, and a better fit for my skills and disposition. 

But I don’t think I live in that world. 

It’s possible that I should still be doing this. If the AI situation looks intractable enough, maybe the most dignity is in making peace with death, and then backing up and starting to build the kind of sane civilization that would have handled the problem well, if we had been faster. 


[ 1 ] – Some of the suckiness of the world is due to currently unavoidable circumstances, diseases like aging and cancer for instance suck. But even those problems are touched by our half-functional institutions. It is not currently the case that a given cancer patient gets the best treatment our planet could in principle afford given our current level of technology. If the norms of medical research were better, or even if the medical literature was better indexed, it would be clearer which treatments worked best. If medical professional’s outcomes were tracked, and scored, it would be possible to buy provably better treatment, which would incentivize diffusion of innovation. And so on.

[ 2 ] – This isn’t literally guaranteed. It could of course be the case that we discover methods that work well in our current world, but that our world changes radically enough that our tools and methods stop being applicable. 

It’s not out of the question that a few network states are successfully founded, before legacy states wise up and coordinate to close that option.

[ 3 ] – Indeed, Robin Hanson’s decades long advocacy of prediction markets and MetaMed’s nascent attempt to reform medicine both seem to me like examples of the kinds of project that we might try to undertake. Neither succeeded, but both gave us more detail on the shape of the problem (noisily, since outcomes are very dependent on luck; a problem that destroyed a first attempt might be defeated by a second attempt).

I side with my psych bugs

[I wrote this something like 6 months ago]

  • I basically always side with my psych bugs. If I object to something in a psychy way, my attitude is generally, “I don’t understand you, or what you’re about, little psych bug, but I’m with you. I’m not going to let anyone or anything force us to do anything that you don’t want to do.”
  • Sometimes this is only one half of a stance, where the other half is a kind of clear-sighted sense of “but this isn’t actually the optimal behavior set, right? I could be doing better by my goals and values by doing something else?” But even in that case, my allyship of my psych bugs comes first.
  • I don’t throw myself under the bus just because I don’t understand the root of my own behavior patterns. And I don’t force myself to change just because I think I’ve understood. Deep changes to my psychology are almost always to be done from the inside out. The psychy part of me is the part that gets to choose if it wants to change. I might do CoZEs, but the psychy part of me gets to decide if it wants to do CoZEs and which ones to do.
  • Admittedly, there’s a bit of an art here. There are cases where it’s not a psychy objection, but simple fear of discomfort that is holding me back. Sometimes I’m shying away from doing another rep at the gym, or doing another problem on the problem set, not because I have a deep objection, but because it’s hard. And sometimes I don’t want to change some of my social behavior patterns because that would mean being outside my comfort zone, and so I’m avoiding it or rationalizing why not to change.
  • And for those I should just summon courage and take action anyway. (Though in that last example there, in particular, I want to summon compassion along with courage.)
  • There’s a tricky problem here of how to deal with psychy internal conflicts of various stripes. I don’t claim that this is the
  • I want to explain a bit of why I have this stance. There are basically 3 things that can go wrong if I do a simple override of a psych bug. These are kind of overlapping. I don’t think they’re really one thing or another.

1. I’m disintegrated and I take ineffective action.

  • If something feels really bad to do, in a psychy way, and I try to do it anyway, then I’m fighting myself. My behavior is in practice going to be spastic as I alternate between trying to do X and trying to do not X, or worse, try and do X and not X at the same time.
  • Recent example: I was interested in helping my friends with their CFAR workshops, but also thought that what they were doing was bad in an important way and incremental improvements to what they were doing would make things worse because I didn’t really trust the group. Even now, I’m struggling to put to paper exactly what the conflict was here.
  • So I ended up 1) not helping very much, because I was not full-heartedly committed to the project and 2) metaphorically stabbing myself in the foot as I repeatedly did things that seemed terrible to me in hard-to-explain ways.
  • And the way that I get integration is I own all my values, not giving up on any of them, and I work to get to the root of what each one is protecting, and to figure out how to satisfy all of them. It

2. I harm some part of my values.

  • My psych bugs are sort of like allergies to something, which are keeping me mored to the things that I care about. They flare up when it seems like the world is encroaching on something deeply important to me that I won’t be able to argue for or defend against.
  • Examples:
  • I was often triggered around some things with romance, and most people’s suggestions around romance. I sometimes expressed that it felt like other people were trying to gaslight me about what I want. There was something deep and important to me in this domain, and also it was fragile. I could loose track of it.
  • I have a deep aversion to doing drugs. (I once fainted from listening to some benign drug experiences, and have left academic lectures on LSD, because I felt like I was going to throw up). I haven’t explored this in detail, but something in me sees something about (some) drugs, and is like “nope. nope. nope. Critically bad. Unacceptable.” Probably something about maintaining my integrity as an agent and a thinking thing.
  • It is really not ok, in either of these cases to just overwrite myself here. Both of these “aversions” are keeping me mored to something that is deeply important to me, and that I do not want to loose, but which the world is exerting pressure against.
  • (The seed of a theory of triggeredness)

3. The territory strikes back.

  • Most importantly, sometimes the psychy part of me is smarter than me, or at least is managing to track some facet of reality that I’m not tracking, and if I just do an override, even if this worked in the sense of not putting me in a position where I’m fighting myself, reality will smack me in the face.
  • Some examples:
  • Most people hurt themselves / impair themselves in learning “social skills”, and the first commitment when learning “social skills” is to not damage yourself.
  • Note that that doesn’t mean that you should just do nothing. You still want to get stronger. But you want to do it in a way that doesn’t violate more important commitments.
  • One class of psych bug in me seems to be pointing at the Geeks, Mops, Sociopaths dynamic. I have miscalibrated (?) counter-reactions to anything that looks like letting in MOPs or Sociopaths into things that I care about.
  • Those are real risks! Bad things happen when you do that! Shutting my eyes to my sense of “something bad is happening here” doesn’t help me avoid the problem.
  • In practice, it’s usually more of blend of all three of these failure modes, and I don’t get to know from the outset how much of it is each. And also usually, there’s some overreaction or miscalibration mixed in. But, likewise, I don’t know from the outset what is miscalibration and what isn’t.
  • So, by default, when I have a psychy objection to something, I side with it, and defend it, while doing a dialog process, and trying to move closer to the pareto frontier of my values.

Investing in wayfinding, over speed

A vibe of acceleration

A lot of the vibe of early CFAR (say 2013 to 2015) was that of pushing our limits to become better, stronger, faster. How to get more done in a day, how to become superhumanly effective.

We were trying to save the world, and we were in a race against Unfriendly AI. If CFAR made some of the people in this small community that focused on the important problems 10% more effective and more productive, then we would be that much closer to winning. [ 1 ]

(This isn’t actually what CFAR was doing if you blur your eyes and look at the effects, instead of following the vibe or specific people’s narratives. What CFAR was actually doing was mostly community building and culture propagation. But this is what the vibe was.)

There was sort of a background assumption that augmenting the EA team, or the MIRI team, increasing their magnitude, was good and important and worthwhile.

A notable example that sticks out in my mind: I had a meeting with Val, in which I said that I wanted to test his Turbocharging Training methodology, because if it worked “we should teach it to all the EAs.” (My exact words, I think.)

This vibe wasn’t unique to CFAR. A lot of it came from LessWrong. And early EA as a whole had a lot of this.

I think that partly this was tied up with a relative optimism that was pervasive in that time period. There was a sense that the stakes were dire, but we were going to meet it with grim determination. And there was a kind of energy in the air, if not an endorsed belief, that we would become strong enough, we would solve the problems, and eventually we would win, leading into transhuman utopia.

Like, people talked about x-risk, and how we might all die, but the emotional narrative-feel of the social milieu was more optimistic: that we would rise to the occasion, and things would be awesome forever.

That shifted in 2016, with AlphaZero and some other stuff, when a MIRI leadership’s timelines shortened considerably. There was a bit of “timelines fever”, and a sense of pessimism that has been growing since. [ 2 ]

My reservations

I still have a lot of that vibe myself. I’m very interested in getting Stronger, and faster, and more effective. I certainly have an excitement about interventions to increase magnitude.

But, personally, I’m also much more wary of the appeal of that kind of thing and much less inclined to invest in magnitude-increasing interventions.

That sort of orientation makes sense for the narrative of running a race: “we need to get to Friendly AI before Unfriendly AI arrives.” But given the world, it seems to me that that sort of narrative frame is mostly a bad fit for the actual shape of the problem.

Our situation is that…

1) No one knows what to do, really. There are some research avenues that individual people find promising, but there’s no solution-machine that’s clearly working: no approach that has a complete map of the problem to be solved.

2) There’s much less of a clean and clear distinction between “team FAI” and “team AGI”. It’s less the case that “the world saving team” is distinct from the forces driving us towards doom.

A large fraction of the people motivated by concerns of existential safety work for the leading AGI labs, sometimes directly on capabilities, sometimes on approaches that are ambiguously safety or capabilities, depending on who you ask.

And some of the people who seemed most centrally in the “alignment progress” cluster, the people whom I would have been most unreservedly enthusiastic to boost, have produced results that seem to have been counterfactual to major hype-inducing capability advances. I don’t currently know that to be true, or (conditioning on it being true) know that it was net-harmful. But it definitely undercuts my unreserved enthusiasm for providing support for Paul. (My best guess is that it is still net-positive, and I still plan to seize opertunities I see to help him, if they arise, but less confidently than I would have 2 years ago.)

Going faster and finding ways to go faster is an exploit move. It makes sense when there are some systems (“solution machines“) that are working well, that are making progress, and we want them to work better, to make more progress. But there’s nothing like that currently making systematic progress on .

We’re in an exploration phase, not an execution phase. The thing that the world needs is people who are stepping back and making sense of things, trying to understand the problem well enough to generate ideas that have any hope of working. [ 3 ] Helping the existing systems, heading in the direction that they’re heading, to go faster…is less obviously helpful.

The world has much much more traction on developing AGI than it does on developing FAI. There’s something like a machine that can just turn the crank on making progress towards AGI. There’s no equivalent machine that can take in resources and make progress on safety.

Because of that, it seems plausible that interventions that make people faster, that increase their magnitude instead refining their direction, disproportionately benefit capabilities.

I’m not sure that that’s true. It could be that capabilities progress marches to the drumbeat of hardware progress, and everyone including the outright capabilities researchers moving faster relative to growth in compute is a net gain. It effectively gives humanity more OODA loops on the problems. Maybe increasing everyone’s productivity is good.

I’m not confident in either direction. I’m ambivalent about the sign of those sorts of interventions. And that uncertainly is enough reason for me to think that investing tools to increase people’s magnitude is not a good bet.

Reorienting

Does this mean that I’m giving up on personal growth or helping people around me become better? Emphatically not.

But it does change what kinds of interventions I’m focusing on.

I’m conscious of deferentially promoting the kinds of tech and the cultural memes that seem like they provide us more capacity for orienting, more spaciousness, more wisdom, more carefulness of thought. Methods that help us refine our direction, instead of increase our magnitude.

A heuristic that I use for assessing practices and techniques that I’m considering investing in or spreading: “Would I feel good if this was adopted wholesale by DeepMind or OpenAI?”

Sometimes the answer is “yes”. DeepMind employees having better emotional processing skills, or having a habit of building lines of retreat, seems positive for the world. That would give the individuals and the culture more capacity to reflect, to notice subtle notes of discord, to have flexibility instead from a the tunnel vision of defensiveness or fear.

These days, I’m aiming to develop and promote tools, practices, and memes, that seem good by that heuristic.

I’m more interested in finding ways to give people space to think, than I am in helping them be more productive. Space to think seems more robustly beneficial.

To others

I’m writing this up in large part because it seems like many younger EAs are still acting in accordance with the operational assumption that “making EAs faster and more effective is obviously good.” Indeed, it seems so straightforward, that they don’t seriously question it. “EA is good, so EAs being more effective is good.”

If, you, dear reader, are one of them, you might want to consider these questions over the coming weeks, and ask how you could distinguish between the world where your efforts are helping and the world where they’re making things worse.

I used to think that way. But I don’t anymore. It seems like “effectiveness” in the way that people typically mean it is of ambiguous sign, and actually what we’re bottleneck on is wayfinding.


[ 1 ] – As a number of people noted at the time, the early CFAR workshop was non-trivially a productivity skills program. Certainly epistemology, calibration, and getting maps to reflect the territory were core to the techniques, and ethos. But also a lot of the content was geared towards being more effective, not being blocked, setting habits, and getting stuff done, and only indirectly about figuring out what’s true. (notable examples: TAPs, CoZE as exposure therapy, Aversion Factoring, Propagating Urges, GTD) To a large extent, CFAR was about making participants go faster and hit harder. And there was a sense of enthusiasm

[ 2 ] – The high point of optimism was probably early 2015, when Elon Musk donated 10 million to the future of life institute (“to the community” as Anna put it, at my CFAR workshop of that year). At that point I think people expected him to join the fight.

And then Elon founded OpenAI instead.

I think that this was the emotional turning point for some of the core leaders of the AI-risk cause, and that shift in emotional tenor leaked out into community culture.

[ 3 ] – To be clear, I’m not necessarily recommending stepping back from engagement with the world. Getting orientation usually depends on close, active, contact with the territory. But it does mean that our goal should be less to affect the world, and more to just improve our own understanding enough that we can take action that reliably produces good results.

My current summary of the state of AI risk

Here’s the current, gloomy, state of AI risk:

Scaling

AI capabilities has made impressive progress in the past decade, and particularly in the past 3 years. Deep Learning has passed over the threshold from “interesting and impressive technical achievement” (AlphaGo), to “practically and commercially useful” (DALLE-2, ChatGPT).

Not only is AI capabilities out of the “interesting demonstration” phase, it has been getting more general. Large Language Models are capable of a wide range of cognitive tasks, while AlphaGo only plays go.

That progress has been driven almost entirely by more data and more compute. We mostly didn’t come up with clever insights. We just took out old algorithms, and made them bigger. That you can get increasing capability and increasing generality this way is suggestive that you can get transformative, or superhuman, AI systems, by just continuing to turn up the “size” and “training time” dials.

And because of this dynamic, there is no one, in the whole world, who knows how these systems work.

Misalignment

Modern systems display many of the specific, alignment-failure phenomena that were discussed as theoretical ideas in the AI x-risk community before there were real systems to look at.

I think that is worth emphasizing: The people who thought in 2010 that AGI would most likely destroy the world, predicted specific error modes of AI systems. We can see those error modes in our current systems.

Currently, no one on earth has working solutions for these error modes.

One of the leading AI labs has published an alignment plan. It makes no mention of many the most significant specific problems. It also relies heavily-to-exclusively on a training method that is possibly worse than useless, because it incentivizes deception and manipulation.

Indeed the document might be uncharitably(?) summarized as “We agree that we’re playing with an extremely dangerous technology that could destroy the world. Our plan is to cross our fingers and hope that our single safety technique will scale well enough that we can have the powerful AI systems themselves help us solve most of the hard parts of the problem”, without giving any argument at all for why that is a reasonable assumption, much less an assumption worth risking the whole world on.

A related take.

Most AI labs have no published alignment plan at all.

MIRI, the original AI alignment group, the one that first pointed out the problem, that has been working on it the longest, and who (according to some voices) took the problems most seriously, have all but given up, and urge us not to set our hopes on survival.

They say that they made a desperate effort to solve the core technical problems of alignment, failed, and don’t have any plan for how to proceed. The organization has largely dispersed: a majority of the technical staff have left the org, and most of the senior researchers are not currently working on any technical agenda.

There are around 100 to 200 people working seriously on the technical problems of alignment (compared to ~1500 technical researchers at the current leading AI labs, and the ~50,000 people working on Machine learning more generally). Some of them have research trajectories that they consider promising, and are more optimistic than MIRI. But none of them currently have anything like complete, non-speculative alignment plan.

To my knowledge, the most optimistic people who work full time on alignment have at least a double digit probability that AI will destroy the world. [Do note the obvious selection effect, here, though. If a person thinks the risks are sufficiently low, they probably don’t think much about the topic.]

Recent AI advances have opened up new kinds of more empirical research. Partly because Large Language Models are enough like AGI that they can be a toy model for trying some alignment strategies.

There’s 100x the effort going into adversarial training and interpretability research than there was 5 years ago. Maybe that will bear practically-relevant fruit.

(This market thinks that there’s a 37% chance that interpretability tools will give us any understanding of how Large Language Models do the any of the magic that we can’t get with other algorithms by the end of 2026.)

Some people are optimistic about those approaches. They are much more concrete, in some sense, than research that was being done 5 years ago. But it remains to be seen what will come of this research, and how well these approaches, if they work, will scale to increasingly large and powerful systems.

Policy

There are a handful of people working on “AI policy”, or attempting to get into positions of power in government to guide public policy. The most important thing about AI policy is that there are currently approximately no workable AI policy ideas that both help with the biggest problems, and are at all politically feasible.

Maybe we could set up a HUGE tax on training AI systems that slows down progress a lot, to buy us time? Or slow things down with regulations?

You might like to make advanced AI research straight up illegal, but on my models, there are a bunch problems in the implementation details of a policy like that. If AI progress mostly comes from more compute…then we would have to put a put a global ban on making more computer chips? Or using too big of a computing cluster?

Something like that probably would put Earth in a better position (though it doesn’t solve the problem, only buys us time). But a policy like that requires both enormous political might (you’re just going to end a bunch of industries, by fiat) and a kind of technical precision that law-makers virtually never have.

And as Eliezer points out, we just had a global pandemic that was probably the result of gain of function research. Gain of function research entails making diseases more virulent or dangerous on purpose. It doesn’t have big benefits, and is supported mostly by a handful of scientists who do that work, for reasons that egregiously fail cost-benefit analysis.

But the world has not banned gain-of-function research, as obvious as that would be to do.

If that doesn’t happen, it seems utterly implausible that the government will successfully ban making or using big computers, given that the whole world uses computers, and there are enormous economic forces opposing them.

Field expansion

Over the past 10 years, an increasing number of people have started treating AI safety as a real subject of concern. That’s arguably good, but more people who agree the problem is important is not helpful if there are not tractable solutions to the problems to contribute to.

This has caused many more new, young, researchers to enter the field. Mostly these new people are retreading old ideas, without realizing it, and accordingly more optimistic. The MIRI old guard say that almost every one of these people (with a small number of exceptions that can be counted on one hand), are dodging the hard part of the problem. Despite the influx of new people, no one is actually engaging with the actual, hard problem of alignment, at all, they say. It’s not that they’re trying and failing. They’re not even doing the kind of work that could make progress.

The influx of people has so far not lead to many new insights. Perhaps we only need to give it time, and some of those people will blossom, but I’ll also note that 4 of the top 5 most promising/impressive alignment researchers had already gotten into the field by 2015. I think that there is a strong correlation between doing good work and having identified the problem early / being compelled by chains of technical and abstract moral reasoning. I think it is likely that there will not be another researcher who produces alignment ideas of the quality of Paul Christiano’s in the next 10 years. I think Paul is likely the best we’re going to get.

I can think of exactly one up-and-coming person who might to be grow to be that caliber of researcher. (Though the space is so empty that one more person like that is something like a multiplier of 20 or 30% on our planet’s total serious effort on the hard problems that MIRI claims almost everyone fails to engage with.)

There is now explicit infrastructure to teach and mentor these new people though, and that seems great. It had seemed for a while that the bottleneck for people coming to do good safety research was mentorship from people that already have some amount of traction on the problem. Someone noticed this and set up a system to make it as easy as possible for experienced alignment researchers to mentor as many junior researchers as they want to, without needing to do a bunch of assessment of candidates or to deal with logistics. Given the state of the world, this seems like an obvious thing to do.

I don’t know that this will actually work (especially if most of the existing researchers are themselves doing work that dodges the core problem), but it is absolutely the thing to try for making more excellent alignment researchers doing real work. And it might turn out that this is just a scalable way to build a healthy field. I’m grateful for and impressed by the SERI MATS team for making this happen.

A sizable fraction of these new people, sincerely motivated by AI safety concerns, go to work on AI capabilities at AGI labs. Many of the biggest improvements in AI capabilities over the past year (RLHF enabling ChatGPT, in particular) have been the direct result of work done by people motivated by AI safety. It is a regular occurrence that I talk to someone who wants to help, and their plan is to go work for one of the AGI labs actively rushing to build AGI. Usually this is with the intention of “nudging” things toward safety, with no more detailed plan than that. Sometimes people have a more detailed model that involves doing specific research that they believe will help (often research in an ambiguous zone that is regarded as “capabilities” by some, and “safety” by others).

All three of the leading AI labs are directly causally downstream of intervention from AI safety folk. Two of the labs would definitely not have been started without our action, and remaining one is ambiguous. It continues to be the case that the AI safety movement drives interest, investment, and talent into developing more and more advanced AI systems, with the explicit goal of building AGI.

(This is very hard to assess, just as all historical counterfactuals are hard to assess, but it seems likely to me that, overall, the net effect of all the people trying to push on AI safety over the past 20 has been to make the world less safe, by accelerating AI timelines, and barely making any technical progress on alignment.)

The future

As AI capabilities grow, the hype around AI increases. More money, compute, and smart research effort is spent on making more and better AI every year.

(I hear that ChatGTP caused Google Brain to go “code red”. My understanding is that, previously the culture of Google Brain had been skeptical of AGI, treating it as pie-in-the-sky fantasy by unserious people. But the release of ChatGPT, caused them have emergency meetings, pulling researchers away from NeurIPS to discuss their strategy pivots.) 

No one knows when transformative AI will arrive. But I don’t know a single person who’s estimated timeline got longer, in the past 3 years. But I can think of dozens of people who’s timeline shrank. And of those that didn’t change, their timelines were already short.

The trend has definitely been taking nearer-term possibilities more seriously.

5 years is possible. 10 years is likely. 30 years would be surprising.

The world is just starting to take an extremely wild ride. We’re extremely unprepared for it, in terms of technical safety and in terms of our society’s ability to adapt gracefully to the shock.

15 years ago, some people first started pointing out that this trajectory would likely end in extinction. It seems like in almost every respect, the situation has gotten worse since then, not better.

I expect it to situation to continue to worsen, as the AI capabilities -> AI hype -> AI capabilities cycle accelerates, and as a garbled lowest-common-denominator version of AI safety becomes a talking point on Fox News, etc., and a tribal battleground of the culture war.

The situation does not look good.

A note on “instrumental rationality”

[The following is a footnote that I wrote in a longer document.]

I’m focusing on epistemic rationality. That’s not because instrumental rationality isn’t real, or isn’t part of the art, but because focusing on instrumental rationality tends to lead people astray. Instrumental rationality has a way of growing to absorb any and all self help, which dilutes the concept to uselessness. “Does it help you win?” If so, then it’s instrumentally rational! [ 1 ]

While the art cannot exist for its own sake, it must be in service of some real goal, I claim that the motions of attempting to systematically change one’s map to reflect the territory are central to the kinds of systematized winning that are properly called “rationality.”

I declare that rationality is the way of winning by way of the map that reflects the territory.

There may very well be other arts that lead to more-or-less domain-general systematic winning by another mechanism, either orthogonal to rationality (e.g. good sleep habits, spaced repetition, practices to increase one’s willpower) or actively counter to rationality (e.g. intentionally delusional self confidence). Not all practices, or even all mental practices, that contribute to success ought to be called “rationality”. [ 2 ]

The ontological commitment that all practices that produce success should count as rationality commits one to either adopting anti-epistemology and not-epistemology as part of rationality, as long as they work, or distorting one’s categories to deny that those practices “actually work.”  This seems like an epistemic error, first and foremost, and/or is a pragmatically unhelpful categorization for people that want to coordinate on a rationality project.

These failure modes are not hypothetical. I’ve observed people to label any and all productivity hacks or cool “mind practices” as rationality (often without much evaluation of whether they does, in fact, help you win, much less whether they helps you attain more accurate beliefs). And I’ve likewise observed people deny that Donald Trump is successful at accomplishing his goals.

It might be that there are arts that work that are counter to rationality, and that I give up the potential power in the cultivation of my art. If so, I would like to see that clearly.

Rationality refers to a specific set of practices and virtues. There are other practices and other virtues, many of which are worth cultivating. But we do ourselves a disservice by calling them all “rationality.”

And, further, I make the empirical pedagogical claim that the way to instrumental rationality, as I am using the term, is through the diligent practice of epistemic rationality. So that is where would-be rationality developers should focus their efforts, at least at first.


[ 1 ] –  Note the difference between “it is instrumentally rational” and “it is instrumental rationality.” And, as is often the case, Eliezer presaged this year ago. But in practice, these tend to bleed together. Helpful, or just cool, practices get absorbed into “rationality”, because rationalists are disproportionally the kind of people that like playing around with mental techniques. I am too!

Further, I think Eliezer’s criterion for instrumental rationality, read literally, is not strict enough, since it could include, in principle, Mythic Mode, or affirmations, or using your chakras, or Tarot reading, as “cognitive algorithms”. And in practice, these do get included under the term.

(And maybe on due consideration, we will think that chakras, or any other bit of woo, are meaningfully real, and that practices that depend on them are properly part of the art of the map that reflects the territory! I’m not ontologically committed to them not being part of the art, either. But their being “real” is not sufficient criterion to be included in the art.)

[ 2 ] – A concept that is useful here is “applied psychology”. Spaced Repetition, Trigger Action Plans, or social accountability are applied psychology. Saying oops, murphyjitsu, fermi estimates, or a particular TAP to ask for an example are (applied) rationality. I have rely on many practices that are applied psychology but not applied rationality.

I don’t claim that this is a perfectly crisp distinction. Most applied rationality does depend on some applied psychology for implementation. But I think it is helpful to recognize that not all techniques that involve your mind are “rationality.”

What is of increasing marginal value, and what is of decreasing?

See also: this comment

There’s a key question that should govern the way we engage with the world: “Which things have increasing marginal returns and which things have decreasing marginal returns.”

Sometimes people are like “I’ll compromise between what I like doing and what has impact, finding something that scores pretty good on both.” Or they’ll say, “I was already planning to get a PhD in [field]/ run a camp for high schoolers / do personal development training / etc. I’ll make this more EA on the margin.”

They are doomed. There’s a logistic success curve, and there are orders of magnitude differences in the impact of different interventions. Which problem you work on is by far the most important determinant of your impact on the world, since most things don’t matter very much at all. The difference between the best project you can find and some pretty good project, is often so large as to swamp every other choice that you make. And within a project, there are a bunch of choices to be made, which themselves can make orders of magnitude of difference, often the difference between “this is an amazing project and this is basically worthless.”

By deciding to compromise, to only half-way optimize, you’re knowingly throwing away most of your selection pressure. You’ve lost lost pretty much all your hope of doing anything meaningful. In a domain where the core problems are not solved by additive incremental improvements, a half-assed action rounds down to worthless.

On the other hand, sometimes people push themselves extra hard to work one more hour at the end of the day when they’re tired and flagging. Often, but not always, they are pwnd. Your last hour of the day is not your best hour of the day. Very likely, it’s your worst hour. For most of the work that we have to do, higher quality hours are superlinearly more valuable than lower quality hours. Slowly burning yourself out, or limiting your rest (which might curtail your highest quality hours tomorrow), so that you can eke out a one more low quality hour of work, is a bad trade. You would be better off not worrying that much about it, and you definitely shouldn’t be taking on big costs for “optimizing your productivity” if “optimizing your productivity” is mainly about getting in increasingly low marginal value work hours.

Some variables have increasing marginal returns. We need to identify those, so that we can aggressively optimize as hard as we can on those, including making hard sacrifices to climb a little higher on the ladder.

Some variables have decreasing marginal returns. We want to identify those so that we can sacrifice on them, and otherwise not spend much attention or effort on them.

Getting which is which right, is basically crucial. Messing up in either direction can leak most of the potential value. Given that, it seems like more people attempting to be ambitiously altruistic should be spending more cognitive effort, trying to get this question right.

My policy on attempting to get people crypreserved

My current policy: If for, whatever reason, I have been allocated decision making power over what to do with a person’s remains, I will, by default,  attempt to get them cryopreserved. But if they expressed a different preference while alive, I would honor that preference.

For instance, if [my partner] was incapacitated right now, and legally dead, and I was responsible for making that decision, I would push to cryopreserve her. This is not a straightforward extrapolation of her preferences, since, currently, she is not opposed in principle, but doesn’t want to spend money that could be allocated for a better altruistic payoff. But she’s also open to being convinced. If, after clear consideration, she prefers not to be cryopreserved, I would respect and act on that preference. But if I needed to make a decision right now, without the possibility of any further discussion, I would try to get her cryopresereved.

(Also, I would consider if there was an acausal trade that I could make with her values as I understand them, such that those values as I understand them would benefit from the situation, attempting to simulate how the conversation that we didn’t have would have gone. But I don’t commit to fully executing her values as I currently understand them. In places of ambiguity, I would err on the side of what I think is good to do, from my own perspective. That said, after having written the previous sentence. I think it is wrong, in that it doesn’t pass the golden rule for what I would hope she would do if our positions on reversed, which suggests that on general principles, when I am deciding on behalf of a person, I should attempt to execute their values as faithfully as I can (modulo, my own clearly stated ethical injunctions), and if I want something else, to attempt to acausally compensate their values for the trade…That does seem like the obviously correct thing to do.

Ok. I now think that that’s what I would do in this situation: cryopreserve my partner, in part on behalf of my own desire that she live and in part on behalf of the possibility that she would want to be cryopreserved on further reflection, had she had the opportunity for further reflection. And insofar as I am acting on behalf of my own desire that she live, I would attempt to make some kind of trade with her values such that the fraction of probability in which she would have concluded that this is not what she wants, had she had more time to reflect, is appropriately compensated, somehow.

That is a little bit tricky, because most of my budget is already eaten up by optimization for the cosmic altruistic good, so I’m not sure what what I would have to trade, that I counter-factually would not have given anyway. And the fact that I’m in this situation, suggests that I actually do need more of a slack budget that isn’t committed to the cosmic altruistic good, so that I have a budget to trade with. But it seems like something weird has happened if considering how to better satisfy my partner’s values has resulted in my generically spending less of my resources on what my partner values, as a policy. So it seems like something is wonky here.)

Same policy with my family: If my dad or sister was incapacitated, soon to be legally dead, I would push to cryopreserve him/her. But if they had seriously considered the idea and decided against, for any reason, including reasons that I think are stupid, I would respect, and execute, their wishes. [For what it is worth, my dad doesn’t have “beliefs” exactly, so much as postures, but last time he mentioned cryonics, he said something like “I’m into cryonics”/“I think it makes sense.“]

This policy is in part because I guess that cryonics is the right choice and in part because this option preserves optionality in a way that the other doesn’t. If a person is undecided, or hasn’t thought about it much, I want to pick the reversible option for them.

[Indeed, this is mostly why I am signed up myself. I suspect that the philosophy of the future won’t put much value on personal identity. But also, it seems crazy to permanently lock in a choice on the basis of philosophical speculations, produced with my monkey brain, in a confused pre-intelligence-explosion civilization.]

Separately, if a person expressed a wish to be cryopreserved, including casually in conversation (eg “yeah, I think cryonics makes sense”), but hadn’t filled out the goddamn forms, I’ll spend some budget of heroics on trying to get them cryopreserved

I have now been in that situation twice in my life. :angry: Sign up for cryonics, people! Don’t make me do a bunch of stressful coordination and schlepping, to get you a much worse outcome, than if you had just done the paperwork. I do not think it is ok to push for cryopreservation unless one of these conditions (I have been given some authority to decide or the person specifically requested cryo) obtains. I think it is not ok to randomly seize control of what happens to a person’s remains, counter to their wishes, because you think you know better than they did.