I side with my psych bugs

[I wrote this something like 6 months ago]

  • I basically always side with my psych bugs. If I object to something in a psychy way, my attitude is generally, “I don’t understand you, or what you’re about, little psych bug, but I’m with you. I’m not going to let anyone or anything force us to do anything that you don’t want to do.”
  • Sometimes this is only one half of a stance, where the other half is a kind of clear-sighted sense of “but this isn’t actually the optimal behavior set, right? I could be doing better by my goals and values by doing something else?” But even in that case, my allyship of my psych bugs comes first.
  • I don’t throw myself under the bus just because I don’t understand the root of my own behavior patterns. And I don’t force myself to change just because I think I’ve understood. Deep changes to my psychology are almost always to be done from the inside out. The psychy part of me is the part that gets to choose if it wants to change. I might do CoZEs, but the psychy part of me gets to decide if it wants to do CoZEs and which ones to do.
  • Admittedly, there’s a bit of an art here. There are cases where it’s not a psychy objection, but simple fear of discomfort that is holding me back. Sometimes I’m shying away from doing another rep at the gym, or doing another problem on the problem set, not because I have a deep objection, but because it’s hard. And sometimes I don’t want to change some of my social behavior patterns because that would mean being outside my comfort zone, and so I’m avoiding it or rationalizing why not to change.
  • And for those I should just summon courage and take action anyway. (Though in that last example there, in particular, I want to summon compassion along with courage.)
  • There’s a tricky problem here of how to deal with psychy internal conflicts of various stripes. I don’t claim that this is the
  • I want to explain a bit of why I have this stance. There are basically 3 things that can go wrong if I do a simple override of a psych bug. These are kind of overlapping. I don’t think they’re really one thing or another.

1. I’m disintegrated and I take ineffective action.

  • If something feels really bad to do, in a psychy way, and I try to do it anyway, then I’m fighting myself. My behavior is in practice going to be spastic as I alternate between trying to do X and trying to do not X, or worse, try and do X and not X at the same time.
  • Recent example: I was interested in helping my friends with their CFAR workshops, but also thought that what they were doing was bad in an important way and incremental improvements to what they were doing would make things worse because I didn’t really trust the group. Even now, I’m struggling to put to paper exactly what the conflict was here.
  • So I ended up 1) not helping very much, because I was not full-heartedly committed to the project and 2) metaphorically stabbing myself in the foot as I repeatedly did things that seemed terrible to me in hard-to-explain ways.
  • And the way that I get integration is I own all my values, not giving up on any of them, and I work to get to the root of what each one is protecting, and to figure out how to satisfy all of them. It

2. I harm some part of my values.

  • My psych bugs are sort of like allergies to something, which are keeping me mored to the things that I care about. They flare up when it seems like the world is encroaching on something deeply important to me that I won’t be able to argue for or defend against.
  • Examples:
  • I was often triggered around some things with romance, and most people’s suggestions around romance. I sometimes expressed that it felt like other people were trying to gaslight me about what I want. There was something deep and important to me in this domain, and also it was fragile. I could loose track of it.
  • I have a deep aversion to doing drugs. (I once fainted from listening to some benign drug experiences, and have left academic lectures on LSD, because I felt like I was going to throw up). I haven’t explored this in detail, but something in me sees something about (some) drugs, and is like “nope. nope. nope. Critically bad. Unacceptable.” Probably something about maintaining my integrity as an agent and a thinking thing.
  • It is really not ok, in either of these cases to just overwrite myself here. Both of these “aversions” are keeping me mored to something that is deeply important to me, and that I do not want to loose, but which the world is exerting pressure against.
  • (The seed of a theory of triggeredness)

3. The territory strikes back.

  • Most importantly, sometimes the psychy part of me is smarter than me, or at least is managing to track some facet of reality that I’m not tracking, and if I just do an override, even if this worked in the sense of not putting me in a position where I’m fighting myself, reality will smack me in the face.
  • Some examples:
  • Most people hurt themselves / impair themselves in learning “social skills”, and the first commitment when learning “social skills” is to not damage yourself.
  • Note that that doesn’t mean that you should just do nothing. You still want to get stronger. But you want to do it in a way that doesn’t violate more important commitments.
  • One class of psych bug in me seems to be pointing at the Geeks, Mops, Sociopaths dynamic. I have miscalibrated (?) counter-reactions to anything that looks like letting in MOPs or Sociopaths into things that I care about.
  • Those are real risks! Bad things happen when you do that! Shutting my eyes to my sense of “something bad is happening here” doesn’t help me avoid the problem.
  • In practice, it’s usually more of blend of all three of these failure modes, and I don’t get to know from the outset how much of it is each. And also usually, there’s some overreaction or miscalibration mixed in. But, likewise, I don’t know from the outset what is miscalibration and what isn’t.
  • So, by default, when I have a psychy objection to something, I side with it, and defend it, while doing a dialog process, and trying to move closer to the pareto frontier of my values.

Investing in wayfinding, over speed

A vibe of acceleration

A lot of the vibe of early CFAR (say 2013 to 2015) was that of pushing our limits to become better, stronger, faster. How to get more done in a day, how to become superhumanly effective.

We were trying to save the world, and we were in a race against Unfriendly AI. If CFAR made some of the people in this small community that focused on the important problems 10% more effective and more productive, then we would be that much closer to winning. [ 1 ]

(This isn’t actually what CFAR was doing if you blur your eyes and look at the effects, instead of following the vibe or specific people’s narratives. What CFAR was actually doing was mostly community building and culture propagation. But this is what the vibe was.)

There was sort of a background assumption that augmenting the EA team, or the MIRI team, increasing their magnitude, was good and important and worthwhile.

A notable example that sticks out in my mind: I had a meeting with Val, in which I said that I wanted to test his Turbocharging Training methodology, because if it worked “we should teach it to all the EAs.” (My exact words, I think.)

This vibe wasn’t unique to CFAR. A lot of it came from LessWrong. And early EA as a whole had a lot of this.

I think that partly this was tied up with a relative optimism that was pervasive in that time period. There was a sense that the stakes were dire, but we were going to meet it with grim determination. And there was a kind of energy in the air, if not an endorsed belief, that we would become strong enough, we would solve the problems, and eventually we would win, leading into transhuman utopia.

Like, people talked about x-risk, and how we might all die, but the emotional narrative-feel of the social milieu was more optimistic: that we would rise to the occasion, and things would be awesome forever.

That shifted in 2016, with AlphaZero and some other stuff, when a MIRI leadership’s timelines shortened considerably. There was a bit of “timelines fever”, and a sense of pessimism that has been growing since. [ 2 ]

My reservations

I still have a lot of that vibe myself. I’m very interested in getting Stronger, and faster, and more effective. I certainly have an excitement about interventions to increase magnitude.

But, personally, I’m also much more wary of the appeal of that kind of thing and much less inclined to invest in magnitude-increasing interventions.

That sort of orientation makes sense for the narrative of running a race: “we need to get to Friendly AI before Unfriendly AI arrives.” But given the world, it seems to me that that sort of narrative frame is mostly a bad fit for the actual shape of the problem.

Our situation is that…

1) No one knows what to do, really. There are some research avenues that individual people find promising, but there’s no solution-machine that’s clearly working: no approach that has a complete map of the problem to be solved.

2) There’s much less of a clean and clear distinction between “team FAI” and “team AGI”. It’s less the case that “the world saving team” is distinct from the forces driving us towards doom.

A large fraction of the people motivated by concerns of existential safety work for the leading AGI labs, sometimes directly on capabilities, sometimes on approaches that are ambiguously safety or capabilities, depending on who you ask.

And some of the people who seemed most centrally in the “alignment progress” cluster, the people whom I would have been most unreservedly enthusiastic to boost, have produced results that seem to have been counterfactual to major hype-inducing capability advances. I don’t currently know that to be true, or (conditioning on it being true) know that it was net-harmful. But it definitely undercuts my unreserved enthusiasm for providing support for Paul. (My best guess is that it is still net-positive, and I still plan to seize opertunities I see to help him, if they arise, but less confidently than I would have 2 years ago.)

Going faster and finding ways to go faster is an exploit move. It makes sense when there are some systems (“solution machines“) that are working well, that are making progress, and we want them to work better, to make more progress. But there’s nothing like that currently making systematic progress on .

We’re in an exploration phase, not an execution phase. The thing that the world needs is people who are stepping back and making sense of things, trying to understand the problem well enough to generate ideas that have any hope of working. [ 3 ] Helping the existing systems, heading in the direction that they’re heading, to go faster…is less obviously helpful.

The world has much much more traction on developing AGI than it does on developing FAI. There’s something like a machine that can just turn the crank on making progress towards AGI. There’s no equivalent machine that can take in resources and make progress on safety.

Because of that, it seems plausible that interventions that make people faster, that increase their magnitude instead refining their direction, disproportionately benefit capabilities.

I’m not sure that that’s true. It could be that capabilities progress marches to the drumbeat of hardware progress, and everyone including the outright capabilities researchers moving faster relative to growth in compute is a net gain. It effectively gives humanity more OODA loops on the problems. Maybe increasing everyone’s productivity is good.

I’m not confident in either direction. I’m ambivalent about the sign of those sorts of interventions. And that uncertainly is enough reason for me to think that investing tools to increase people’s magnitude is not a good bet.


Does this mean that I’m giving up on personal growth or helping people around me become better? Emphatically not.

But it does change what kinds of interventions I’m focusing on.

I’m conscious of deferentially promoting the kinds of tech and the cultural memes that seem like they provide us more capacity for orienting, more spaciousness, more wisdom, more carefulness of thought. Methods that help us refine our direction, instead of increase our magnitude.

A heuristic that I use for assessing practices and techniques that I’m considering investing in or spreading: “Would I feel good if this was adopted wholesale by DeepMind or OpenAI?”

Sometimes the answer is “yes”. DeepMind employees having better emotional processing skills, or having a habit of building lines of retreat, seems positive for the world. That would give the individuals and the culture more capacity to reflect, to notice subtle notes of discord, to have flexibility instead from a the tunnel vision of defensiveness or fear.

These days, I’m aiming to develop and promote tools, practices, and memes, that seem good by that heuristic.

I’m more interested in finding ways to give people space to think, than I am in helping them be more productive. Space to think seems more robustly beneficial.

To others

I’m writing this up in large part because it seems like many younger EAs are still acting in accordance with the operational assumption that “making EAs faster and more effective is obviously good.” Indeed, it seems so straightforward, that they don’t seriously question it. “EA is good, so EAs being more effective is good.”

If, you, dear reader, are one of them, you might want to consider these questions over the coming weeks, and ask how you could distinguish between the world where your efforts are helping and the world where they’re making things worse.

I used to think that way. But I don’t anymore. It seems like “effectiveness” in the way that people typically mean it is of ambiguous sign, and actually what we’re bottleneck on is wayfinding.

[ 1 ] – As a number of people noted at the time, the early CFAR workshop was non-trivially a productivity skills program. Certainly epistemology, calibration, and getting maps to reflect the territory were core to the techniques, and ethos. But also a lot of the content was geared towards being more effective, not being blocked, setting habits, and getting stuff done, and only indirectly about figuring out what’s true. (notable examples: TAPs, CoZE as exposure therapy, Aversion Factoring, Propagating Urges, GTD) To a large extent, CFAR was about making participants go faster and hit harder. And there was a sense of enthusiasm

[ 2 ] – The high point of optimism was probably early 2015, when Elon Musk donated 10 million to the future of life institute (“to the community” as Anna put it, at my CFAR workshop of that year). At that point I think people expected him to join the fight.

And then Elon founded OpenAI instead.

I think that this was the emotional turning point for some of the core leaders of the AI-risk cause, and that shift in emotional tenor leaked out into community culture.

[ 3 ] – To be clear, I’m not necessarily recommending stepping back from engagement with the world. Getting orientation usually depends on close, active, contact with the territory. But it does mean that our goal should be less to affect the world, and more to just improve our own understanding enough that we can take action that reliably produces good results.

My current summary of the state of AI risk

Here’s the current, gloomy, state of AI risk:


AI capabilities has made impressive progress in the past decade, and particularly in the past 3 years. Deep Learning has passed over the threshold from “interesting and impressive technical achievement” (AlphaGo), to “practically and commercially useful” (DALLE-2, ChatGPT).

Not only is AI capabilities out of the “interesting demonstration” phase, it has been getting more general. Large Language Models are capable of a wide range of cognitive tasks, while AlphaGo only plays go.

That progress has been driven almost entirely by more data and more compute. We mostly didn’t come up with clever insights. We just took out old algorithms, and made them bigger. That you can get increasing capability and increasing generality this way is suggestive that you can get transformative, or superhuman, AI systems, by just continuing to turn up the “size” and “training time” dials.

And because of this dynamic, there is no one, in the whole world, who knows how these systems work.


Modern systems display many of the specific, alignment-failure phenomena that were discussed as theoretical ideas in the AI x-risk community before there were real systems to look at.

I think that is worth emphasizing: The people who thought in 2010 that AGI would most likely destroy the world, predicted specific error modes of AI systems. We can see those error modes in our current systems.

Currently, no one on earth has working solutions for these error modes.

One of the leading AI labs has published an alignment plan. It makes no mention of many the most significant specific problems. It also relies heavily-to-exclusively on a training method that is possibly worse than useless, because it incentivizes deception and manipulation.

Indeed the document might be uncharitably(?) summarized as “We agree that we’re playing with an extremely dangerous technology that could destroy the world. Our plan is to cross our fingers and hope that our single safety technique will scale well enough that we can have the powerful AI systems themselves help us solve most of the hard parts of the problem”, without giving any argument at all for why that is a reasonable assumption, much less an assumption worth risking the whole world on.

A related take.

Most AI labs have no published alignment plan at all.

MIRI, the original AI alignment group, the one that first pointed out the problem, that has been working on it the longest, and who (according to some voices) took the problems most seriously, have all but given up, and urge us not to set our hopes on survival.

They say that they made a desperate effort to solve the core technical problems of alignment, failed, and don’t have any plan for how to proceed. The organization has largely dispersed: a majority of the technical staff have left the org, and most of the senior researchers are not currently working on any technical agenda.

There are around 100 to 200 people working seriously on the technical problems of alignment (compared to ~1500 technical researchers at the current leading AI labs, and the ~50,000 people working on Machine learning more generally). Some of them have research trajectories that they consider promising, and are more optimistic than MIRI. But none of them currently have anything like complete, non-speculative alignment plan.

To my knowledge, the most optimistic people who work full time on alignment have at least a double digit probability that AI will destroy the world. [Do note the obvious selection effect, here, though. If a person thinks the risks are sufficiently low, they probably don’t think much about the topic.]

Recent AI advances have opened up new kinds of more empirical research. Partly because Large Language Models are enough like AGI that they can be a toy model for trying some alignment strategies.

There’s 100x the effort going into adversarial training and interpretability research than there was 5 years ago. Maybe that will bear practically-relevant fruit.

(This market thinks that there’s a 37% chance that interpretability tools will give us any understanding of how Large Language Models do the any of the magic that we can’t get with other algorithms by the end of 2026.)

Some people are optimistic about those approaches. They are much more concrete, in some sense, than research that was being done 5 years ago. But it remains to be seen what will come of this research, and how well these approaches, if they work, will scale to increasingly large and powerful systems.


There are a handful of people working on “AI policy”, or attempting to get into positions of power in government to guide public policy. The most important thing about AI policy is that there are currently approximately no workable AI policy ideas that both help with the biggest problems, and are at all politically feasible.

Maybe we could set up a HUGE tax on training AI systems that slows down progress a lot, to buy us time? Or slow things down with regulations?

You might like to make advanced AI research straight up illegal, but on my models, there are a bunch problems in the implementation details of a policy like that. If AI progress mostly comes from more compute…then we would have to put a put a global ban on making more computer chips? Or using too big of a computing cluster?

Something like that probably would put Earth in a better position (though it doesn’t solve the problem, only buys us time). But a policy like that requires both enormous political might (you’re just going to end a bunch of industries, by fiat) and a kind of technical precision that law-makers virtually never have.

And as Eliezer points out, we just had a global pandemic that was probably the result of gain of function research. Gain of function research entails making diseases more virulent or dangerous on purpose. It doesn’t have big benefits, and is supported mostly by a handful of scientists who do that work, for reasons that egregiously fail cost-benefit analysis.

But the world has not banned gain-of-function research, as obvious as that would be to do.

If that doesn’t happen, it seems utterly implausible that the government will successfully ban making or using big computers, given that the whole world uses computers, and there are enormous economic forces opposing them.

Field expansion

Over the past 10 years, an increasing number of people have started treating AI safety as a real subject of concern. That’s arguably good, but more people who agree the problem is important is not helpful if there are not tractable solutions to the problems to contribute to.

This has caused many more new, young, researchers to enter the field. Mostly these new people are retreading old ideas, without realizing it, and accordingly more optimistic. The MIRI old guard say that almost every one of these people (with a small number of exceptions that can be counted on one hand), are dodging the hard part of the problem. Despite the influx of new people, no one is actually engaging with the actual, hard problem of alignment, at all, they say. It’s not that they’re trying and failing. They’re not even doing the kind of work that could make progress.

The influx of people has so far not lead to many new insights. Perhaps we only need to give it time, and some of those people will blossom, but I’ll also note that 4 of the top 5 most promising/impressive alignment researchers had already gotten into the field by 2015. I think that there is a strong correlation between doing good work and having identified the problem early / being compelled by chains of technical and abstract moral reasoning. I think it is likely that there will not be another researcher who produces alignment ideas of the quality of Paul Christiano’s in the next 10 years. I think Paul is likely the best we’re going to get.

I can think of exactly one up-and-coming person who might to be grow to be that caliber of researcher. (Though the space is so empty that one more person like that is something like a multiplier of 20 or 30% on our planet’s total serious effort on the hard problems that MIRI claims almost everyone fails to engage with.)

There is now explicit infrastructure to teach and mentor these new people though, and that seems great. It had seemed for a while that the bottleneck for people coming to do good safety research was mentorship from people that already have some amount of traction on the problem. Someone noticed this and set up a system to make it as easy as possible for experienced alignment researchers to mentor as many junior researchers as they want to, without needing to do a bunch of assessment of candidates or to deal with logistics. Given the state of the world, this seems like an obvious thing to do.

I don’t know that this will actually work (especially if most of the existing researchers are themselves doing work that dodges the core problem), but it is absolutely the thing to try for making more excellent alignment researchers doing real work. And it might turn out that this is just a scalable way to build a healthy field. I’m grateful for and impressed by the SERI MATS team for making this happen.

A sizable fraction of these new people, sincerely motivated by AI safety concerns, go to work on AI capabilities at AGI labs. Many of the biggest improvements in AI capabilities over the past year (RLHF enabling ChatGPT, in particular) have been the direct result of work done by people motivated by AI safety. It is a regular occurrence that I talk to someone who wants to help, and their plan is to go work for one of the AGI labs actively rushing to build AGI. Usually this is with the intention of “nudging” things toward safety, with no more detailed plan than that. Sometimes people have a more detailed model that involves doing specific research that they believe will help (often research in an ambiguous zone that is regarded as “capabilities” by some, and “safety” by others).

All three of the leading AI labs are directly causally downstream of intervention from AI safety folk. Two of the labs would definitely not have been started without our action, and remaining one is ambiguous. It continues to be the case that the AI safety movement drives interest, investment, and talent into developing more and more advanced AI systems, with the explicit goal of building AGI.

(This is very hard to assess, just as all historical counterfactuals are hard to assess, but it seems likely to me that, overall, the net effect of all the people trying to push on AI safety over the past 20 has been to make the world less safe, by accelerating AI timelines, and barely making any technical progress on alignment.)

The future

As AI capabilities grow, the hype around AI increases. More money, compute, and smart research effort is spent on making more and better AI every year.

(I hear that ChatGTP caused Google Brain to go “code red”. My understanding is that, previously the culture of Google Brain had been skeptical of AGI, treating it as pie-in-the-sky fantasy by unserious people. But the release of ChatGPT, caused them have emergency meetings, pulling researchers away from NeurIPS to discuss their strategy pivots.) 

No one knows when transformative AI will arrive. But I don’t know a single person who’s estimated timeline got longer, in the past 3 years. But I can think of dozens of people who’s timeline shrank. And of those that didn’t change, their timelines were already short.

The trend has definitely been taking nearer-term possibilities more seriously.

5 years is possible. 10 years is likely. 30 years would be surprising.

The world is just starting to take an extremely wild ride. We’re extremely unprepared for it, in terms of technical safety and in terms of our society’s ability to adapt gracefully to the shock.

15 years ago, some people first started pointing out that this trajectory would likely end in extinction. It seems like in almost every respect, the situation has gotten worse since then, not better.

I expect it to situation to continue to worsen, as the AI capabilities -> AI hype -> AI capabilities cycle accelerates, and as a garbled lowest-common-denominator version of AI safety becomes a talking point on Fox News, etc., and a tribal battleground of the culture war.

The situation does not look good.

A note on “instrumental rationality”

[The following is a footnote that I wrote in a longer document.]

I’m focusing on epistemic rationality. That’s not because instrumental rationality isn’t real, or isn’t part of the art, but because focusing on instrumental rationality tends to lead people astray. Instrumental rationality has a way of growing to absorb any and all self help, which dilutes the concept to uselessness. “Does it help you win?” If so, then it’s instrumentally rational! [ 1 ]

While the art cannot exist for its own sake, it must be in service of some real goal, I claim that the motions of attempting to systematically change one’s map to reflect the territory are central to the kinds of systematized winning that are properly called “rationality.”

I declare that rationality is the way of winning by way of the map that reflects the territory.

There may very well be other arts that lead to more-or-less domain-general systematic winning by another mechanism, either orthogonal to rationality (e.g. good sleep habits, spaced repetition, practices to increase one’s willpower) or actively counter to rationality (e.g. intentionally delusional self confidence). Not all practices, or even all mental practices, that contribute to success ought to be called “rationality”. [ 2 ]

The ontological commitment that all practices that produce success should count as rationality commits one to either adopting anti-epistemology and not-epistemology as part of rationality, as long as they work, or distorting one’s categories to deny that those practices “actually work.”  This seems like an epistemic error, first and foremost, and/or is a pragmatically unhelpful categorization for people that want to coordinate on a rationality project.

These failure modes are not hypothetical. I’ve observed people to label any and all productivity hacks or cool “mind practices” as rationality (often without much evaluation of whether they does, in fact, help you win, much less whether they helps you attain more accurate beliefs). And I’ve likewise observed people deny that Donald Trump is successful at accomplishing his goals.

It might be that there are arts that work that are counter to rationality, and that I give up the potential power in the cultivation of my art. If so, I would like to see that clearly.

Rationality refers to a specific set of practices and virtues. There are other practices and other virtues, many of which are worth cultivating. But we do ourselves a disservice by calling them all “rationality.”

And, further, I make the empirical pedagogical claim that the way to instrumental rationality, as I am using the term, is through the diligent practice of epistemic rationality. So that is where would-be rationality developers should focus their efforts, at least at first.

[ 1 ] –  Note the difference between “it is instrumentally rational” and “it is instrumental rationality.” And, as is often the case, Eliezer presaged this year ago. But in practice, these tend to bleed together. Helpful, or just cool, practices get absorbed into “rationality”, because rationalists, are disproportionally the kind of people that like playing around with mental techniques. I am too!

Further, I think Eliezer’s criterion for instrumental rationality, read literally, is not strict enough, since it could include, in principle, Mythic Mode, or affirmations, or using your chakras, or Tarrot reading, as “cognitive algorithms”. And in practice, these do get included under the term.

(And maybe on due consideration, we will think that chakras, or any other bit of woo, are meaningfully real, and that practices that depend on them are properly part of the art of the map that reflects the territory! I’m not ontologically committed to them not being part of the art, either. But their being real is not sufficient criterion to be included in the art.)

[ 2 ] – A concept that is useful here is “applied psychology”. Anki, Trigger Action Plans, or social accountability are applied psychology. Saying oops, murphyjitsu, fermi estimates, or a particulare TAP to ask for an example are (applied) rationality. I have rely on many practices that are applied psychology but not applied rationality.

I don’t claim that this is a perfectly crisp distinction. The two categories blend into each other, and most applied rationality does depend on some applied psychology for implementation. But I think it is helpful to recognize that not all techniques that involve your mind are “rationality.”

What is of increasing marginal value, and what is of decreasing?

See also: this comment

There’s a key question that should govern the way we engage with the world: “Which things have increasing marginal returns and which things have decreasing marginal returns.”

Sometimes people are like “I’ll compromise between what I like doing and what has impact, finding something that scores pretty good on both.” Or they’ll say, “I was already planning to get a PhD in [field]/ run a camp for high schoolers / do personal development training / etc. I’ll make this more EA on the margin.”

They are doomed. There’s a logistic success curve, and there are orders of magnitude differences in the impact of different interventions. Which problem you work on is by far the most important determinant of your impact on the world, since most things don’t matter very much at all. The difference between the best project you can find and some pretty good project, is often so large as to swamp every other choice that you make. And within a project, there are a bunch of choices to be made, which themselves can make orders of magnitude of difference, often the difference between “this is an amazing project and this is basically worthless.”

By deciding to compromise, to only half-way optimize, you’re knowingly throwing away most of your selection pressure. You’ve lost lost pretty much all your hope of doing anything meaningful. In a domain where the core problems are not solved by additive incremental improvements, a half-assed action rounds down to worthless.

On the other hand, sometimes people push themselves extra hard to work one more hour at the end of the day when they’re tired and flagging. Often, but not always, they are pwnd. Your last hour of the day is not your best hour of the day. Very likely, it’s your worst hour. For most of the work that we have to do, higher quality hours are superlinearly more valuable than lower quality hours. Slowly burning yourself out, or limiting your rest (which might curtail your highest quality hours tomorrow), so that you can eke out a one more low quality hour of work, is a bad trade. You would be better off not worrying that much about it, and you definitely shouldn’t be taking on big costs for “optimizing your productivity” if “optimizing your productivity” is mainly about getting in increasingly low marginal value work hours.

Some variables have increasing marginal returns. We need to identify those, so that we can aggressively optimize as hard as we can on those, including making hard sacrifices to climb a little higher on the ladder.

Some variables have decreasing marginal returns. We want to identify those so that we can sacrifice on them, and otherwise not spend much attention or effort on them.

Getting which is which right, is basically crucial. Messing up in either direction can leak most of the potential value. Given that, it seems like more people attempting to be ambitiously altruistic should be spending more cognitive effort, trying to get this question right.

My policy on attempting to get people crypreserved

My current policy: If for, whatever reason, I have been allocated decision making power over what to do with a person’s remains, I will, by default,  attempt to get them cryopreserved. But if they expressed a different preference while alive, I would honor that preference.

For instance, if [my partner] was incapacitated right now, and legally dead, and I was responsible for making that decision, I would push to cryopreserve her. This is not a straightforward extrapolation of her preferences, since, currently, she is not opposed in principle, but doesn’t want to spend money that could be allocated for a better altruistic payoff. But she’s also open to being convinced. If, after clear consideration, she prefers not to be cryopreserved, I would respect and act on that preference. But if I needed to make a decision right now, without the possibility of any further discussion, I would try to get her cryopresereved.

(Also, I would consider if there was an acausal trade that I could make with her values as I understand them, such that those values as I understand them would benefit from the situation, attempting to simulate how the conversation that we didn’t have would have gone. But I don’t commit to fully executing her values as I currently understand them. In places of ambiguity, I would err on the side of what I think is good to do, from my own perspective. That said, after having written the previous sentence. I think it is wrong, in that it doesn’t pass the golden rule for what I would hope she would do if our positions on reversed, which suggests that on general principles, when I am deciding on behalf of a person, I should attempt to execute their values as faithfully as I can (modulo, my own clearly stated ethical injunctions), and if I want something else, to attempt to acausally compensate their values for the trade…That does seem like the obviously correct thing to do.

Ok. I now think that that’s what I would do in this situation: cryopreserve my partner, in part on behalf of my own desire that she live and in part on behalf of the possibility that she would want to be cryopreserved on further reflection, had she had the opportunity for further reflection. And insofar as I am acting on behalf of my own desire that she live, I would attempt to make some kind of trade with her values such that the fraction of probability in which she would have concluded that this is not what she wants, had she had more time to reflect, is appropriately compensated, somehow.

That is a little bit tricky, because most of my budget is already eaten up by optimization for the cosmic altruistic good, so I’m not sure what what I would have to trade, that I counter-factually would not have given anyway. And the fact that I’m in this situation, suggests that I actually do need more of a slack budget that isn’t committed to the cosmic altruistic good, so that I have a budget to trade with. But it seems like something weird has happened if considering how to better satisfy my partner’s values has resulted in my generically spending less of my resources on what my partner values, as a policy. So it seems like something is wonky here.)

Same policy with my family: If my dad or sister was incapacitated, soon to be legally dead, I would push to cryopreserve him/her. But if they had seriously considered the idea and decided against, for any reason, including reasons that I think are stupid, I would respect, and execute, their wishes. [For what it is worth, my dad doesn’t have “beliefs” exactly, so much as postures, but last time he mentioned cryonics, he said something like “I’m into cryonics”/“I think it makes sense.“]

This policy is in part because I guess that cryonics is the right choice and in part because this option preserves optionality in a way that the other doesn’t. If a person is undecided, or hasn’t thought about it much, I want to pick the reversible option for them.

[Indeed, this is mostly why I am signed up myself. I suspect that the philosophy of the future won’t put much value on personal identity. But also, it seems crazy to permanently lock in a choice on the basis of philosophical speculations, produced with my monkey brain, in a confused pre-intelligence-explosion civilization.]

Separately, if a person expressed a wish to be cryopreserved, including casually in conversation (eg “yeah, I think cryonics makes sense”), but hadn’t filled out the goddamn forms, I’ll spend some budget of heroics on trying to get them cryopreserved

I have now been in that situation twice in my life. :angry: Sign up for cryonics, people! Don’t make me do a bunch of stressful coordination and schlepping, to get you a much worse outcome, than if you had just done the paperwork. I do not think it is ok to push for cryopreservation unless one of these conditions (I have been given some authority to decide or the person specifically requested cryo) obtains. I think it is not ok to randomly seize control of what happens to a person’s remains, counter to their wishes, because you think you know better than they did.

Disembodied materialists and embodied hippies?

An observation:

Philosophical materialists (for instance, Yudkowskian rationalists) are often rather disembodied. In contrast, hippies, who express a (sometimes vague) philosophy of non-material being, are usually very embodied.

On the face of it, this seems backwards. If materialists were living their philosophy in practice, it seems like they would be doing something different. This isn’t merely a matter of preference or aesthetics; I think that materialists often mis-predict reality on this dimension. I’ve several times heard an atheist materialist express surprise that, after losing weight or getting in shape, their mood or their ability to think is different. Usually, they would not have verbally endorsed the proposition that one’s body doesn’t impact one’s cognition, but nevertheless, the experience is a surprise for them, as if their implicit model of reality is one of dualism. [an example: Penn Jillette expressing this sentiment following his weight loss]

Ironically, we materialists tend to have an intuitive view of ourselves as disembodied minds inhabiting a body, as opposed to the (more correct) view that flows from our abstract philosophy, that our mind is a body, and if you change my body in various ways, would change me. And hippies, ironically, seem much less likely to make that sort of error.

Why is this?

One possibility is that the causality mostly goes in the other direction: the reason why a person is a materialist is due to a powerfully developed capacity for abstract thought, which is downstream of disembodiment.

The default perspective for a human is dualism, and you reach another conclusion

When is an event surprising enough that I should be confused?

Today, I was reading Mistakes with Conservation of Expected Evidence. For some reason, I was under the impression that the post was written by Rohin Shah; but it turns out it was written by Abram Demski.

In retrospect, I should have been surprised that “Rohin” kept talking about what Eliezer says in the Sequences. I wouldn’t have guessed that Rohin was that “culturally rationalist” or that he would be that interested in what Eliezer said in the sequences. And indeed, I was updating that Rohin was more of a rationalist, with more rationalist interests, than I had thought. If I had been more surprised, I could have noticed my surprise / confusion, and made a better prediction.

But on the other hand, was my surprise so extreme that it should have triggered an error message (confusion), instead of merely an update? Maybe this was just fine reasoning after all?

From a Bayesian perspective, I should have observed this evidence, and increased my credence in both Rohin being more rationalist-y than I thought, and also in the hypothesis that this wasn’t written by Rohin. But practically, I would have needed to generate the second hypothesis, and I don’t think that I had strong enough reason to.

I feel like there’s a semi-interesting epistemic puzzle here. What’s the threshold for a surprising enough observation that you should be confused (much less notice your confusion)?

First conclusions from reflections on my life

I spent some time over the past weekend reflecting on my life, over the past few years. 

I turned 28 a few months ago. The idea was that I wanted to reflect on what I’ve learned from my 20s, but do it a few years early, so that I can start doing better sooner rather than later. (In general, I think doing post mortems before a project has ended is underutilized.) 

I spent far less time than I had hoped. I budgeted 2 and a half days free from external distractions, but ended up largely overwhelmed by a major internal distraction. I might or might not do more thinking here, but for starters, here are some of my conclusions.

For now I’m going to focus on my specific regrets: things that I wish I had done differently, because I would be in a better place now if I had done them. There are plenty of things that I was wrong about, or mistakes that I made, which I don’t have a sense of disappointment in my heart about, because those mistakes were the sort of thing that either did or could have help propel me forward. But of the things I list here, all of these held me back. I am worse today than I might have been, in a very tangible-to-me way, because of these errors.

I wish that I had made more things

I wish that, when I look over my life from this vantage point, that it was “fuller”, that I could see more things that I accomplished, more specific value that my efforts produced.

I spent huge swaths of time thinking about a bunch of different things over the years, or planning / taking steps on various projects, but they rarely reached fruition. Like most people, I think, My history is littered with places where I started putting effort into something, but now have nothing to show for it.

This seems like a huge waste. 

I was looking through some really rough blog posts that I wrote in 2019 (for instance, this one, which rather than being any refined theory, is closer to a post mortem on a particular afternoon of work). And to my surprise, they were concretely helpful to me, more helpful to me than any blog post that I’ve read by someone else in a while. Past-Eli actually did figure out some stuff, but somehow, I had forgotten it.

I spend a lot of time thinking, but I think that if I don’t produce some kind of artifact, the thinking that I do is not just not shared with the world, but is lost to me. Creating something isn’t an extra step, its the crystallization of the cognitive work itself. If I don’t create an artifact, the cognitive work is transient, it leaves no impression on me or the world. It might as well not have happened.

And aside from that, I would feel better about my life now, if instead of a bunch of things that I thought about, there were a bunch of blog posts that I published, even if they were in shitty draft form. To the extent that I can look over my life and see a bunch (small, bad) things that I did, I feel better about my life.

I would feel much better if every place where I had had a cool idea, I had made something, and I could look over them all, and see what I had done.

Going forward, I’m going to hold to the policy that every project should have a deliverable, even if it is something very simple: a shitty blogpost, a google doc a test session, an explanation of what I learned (recorded, and posted on youtube), an MVP app.

And in support of this, I also want to have a policy that as soon as I feel like I have something that I could write up, I do that immediately, instead of adding it to my todo list. Often, I’ll do some thinking about something, and have the sketch of how to write it up in my head, but there’s a little bit of activation energy required to sit down and do it, and I have a bunch of things on my plate (including other essays that I want to write). But then I’ll wait too long, and by the time I come back to it, it doesn’t feel alive anymore.

This is what happened with some recent thinking that I did about ELK for instance. I did clarify some things for myself, and intended to write it up, but by the time I went to do that, it felt stale. And so an ELK weekend, that I participated in a while back back is one more project where I had new thoughts but mostly nothing will come of them.

For this reason, I’m pushing myself to write up this document, right now. I want to create some crystallization of the meager thinking that I did when reflecting on my life, that puts a stake in the ground so that I don’t realize some things, and then just forget about them.

I wish that I made a point to write down the arguments that I was steering by

From 2018 to early 2020, I did not pursue a project that seemed to me like the obvious thing for me to be doing, because of combination of constraints involving considerations of info security, some philosophy of uncertainty problems, and underlying both of those, some ego-attachments. I was instead sort of in a holding pattern: hoping/planning to go forward with something, but not actually taking action on it.

[I don’t want to undersell the ego stuff as my just being unvirtuous. I think it was tracking some thing that were in fact bad, and if I had had sufficient skill, I could have untangled it, and had courage and agency. But I can’t think of what straightforward policy would have allowed me to do that, given the social context that I was in.]

In retrospect the arguments that I was steering my life by were…just not very good. I think if I had made a point to write them up, to clarify what I was doing, and why I was doing it, this would have caused me to notice that they didn’t really hold up. 

If for no other reason than that I would share my google docs, and people would argue against my points.

And in any case, I had the intention at the time of orienting to those arguments and trying to do original applied philosophy to find solutions, or at least better framings, for those problems. And I did this a tensy weency bit, but I didn’t make solid progress. And I think that I could have. And the main thing that I needed to do was actually write up what I was thinking so I could build on it (and secondarily, so other people could comment on it). 

(I’m in particular thinking about some ideas I had in conversation with Scott G at an MSFP. There was a blog-post writing day during that workshop, and I contemplated writing it up (I think actually had a vague intention to write it up sometime), but didn’t because I was tired or something.)

And I think this as been pretty generically true. A lot of my sense of what’s important or how things work seems to have drifted along a seemingly random walk, instead of being a series of specific updates for reasons. 

After I panic-bought, during covid, I made a policy that I don’t move money without at least writing up a one-pager explaining what I’m doing and what my reason for doing it is. This allows me to notice if my reason is stupid (“I just saw some news articles and now I’m panicked”) and it allows me to reflect on my actual thought process, not just the result of my thought process later. (Come to think of it, I think it might be true that my most costly financial decision every might be the only other time that I didn’t follow this policy! I should double check that!)

I think I should have a similar policy here. Any argument or consideration that I’m steering my life by, I should write up as as a google doc, with good argumentative structure. 

The thing that I need, to implement this policy is a the trigger. What would cause me to notice arguments that I’m steering my life by.

I wish I had recorded myself more

[inspired by this tweet]

When I was younger, it was important to me to meet my wife early, so that she could have known me when I was young, to understand what I was like and where I grew from. 

I’ve recently started dating someone, and I wish she was able to know what younger Eli was like. She can read my writing, but the essays and diary entries that I wrote are low bandwidth for getting a sense of a person. 

If I had made a vlog or something, I would have lots and lots of video to watch which would help her get a sense of what I was like.

Similarly, for if I ever have kids, I would like them to be able to know what I was like at their age. 

Furthermore, I spent some time over the past day listening to audio recordings that I made over the last decade. I was shocked by the samples of the way my younger self was, and I wish that I had more of those recorded to compare against.

I feel like I’ve sort of permanently missed the boat on this one. I’ve permanently lost access to some information that I wish I had. But I have a heuristic on a much smaller scale: if I’m in a conversation, and I have the thought “I wish I had been recording this conversation”, I start recording right then. It seems like this same heuristic should apply at the macro scale: if I have the thought “I wish I had been regularly recording myself 10 years ago, I should start doing that now.

I wish that I did more things with discrete time boxes, so that I could notice that I failed

There were very few places where I concretely failed at something, and drew attention to the fact that I failed. As noted, there were lots and lots of projects that never reached fruition, but mostly I just punted on those, intending to continue them. If I had a bad day, I was often afraid to cut my losses and just not do the thing that I had hoped to do.

There are lots of skills that I planned to learn, and then I would attempt (usually in an unfocused way) to learn them in some amount of time, and at the end of that period of time I would not have made much progress. But I would implicitly move out my timeline for learning those things; my failing to make progress did not cause me to give up or allow me to consider not making that skill a part of me at some point. I allowed myself to keep punting my plans to the indefinite future.

This was probably self-fulfilling. Since I knew that if I failed to do or learn something in the short term, I wouldn’t actually count that as a failure in any meaningful sense, I would still be planning to get it somehow, I wasn’t really incentivized to do or learn the thing in that short term.

I think that one thing that would have helped me was planning to do things on specific time horizons (this weekend, this week, this month, whatever), and scheduling a post mortem, ideally with another person, on my calendar, at the end of that time horizon.

Now, I don’t think that this would have worked, directly, I think I still would have squandered that time, or made much slower progress than I hoped. But I by having a crisp demarcation of when I wanted to have a project completed by, scheduled in such a way that I can’t just explain it away as no longer relevant (because I made less progress than I had hoped to make by the time it came around), I would more concretely notice and orient to the fact that something that I had tried to do hadn’t worked. And then I could iterate from there.

I intend to do this going forward. Which concretely means that I should look over my current projects, and timebox out at least one of them, and schedule with someone to postmortem with me.

I should have focused on learning by doing

Most of what I have tried to do over the past decade is acquire skills.

This has not been wholly unsuccessful. I do in fact now posses a number of specific skills that most people don’t, and I have gone from broadly incompetent (but undaunted) to broadly competent, in general.

But most of the specific skill learning that I tried to do seems to have been close to fruitless. Much of what I learned I learned in the process of just working on direct projects. (Though not all of it! I’ve recently noticed how much of my emotional communication and facilitation skills are downstream of doing a lot of Circling, and, I guess, from doing SAS in particular).

 I think that I would have done much better to focus less on building skills and to focus more on just doing concrete things that seemed cool to me. 

(And indeed, I knew this at the time, but didn’t act on it, because of reasons related “choosing projects felt like choosing my identity”, and a maybe a general thing of not taking my obvious known mistakes seriously enough, and maybe something else.

I’m going to have a firm rule for the next six months: I’m allowing myself to still try to acquire skills, but this always to be in the context of a project in which I am building something: 

Paternalism is about outrage

I’m listening to the Minds Almost Meeting podcast episode on Paternalism.

I think Robin is missing or misemphasizing something that is central to the puzzle that he’s investigating. Namely, I think most regulation (or most regulation that is not rooted in special interest groups creating moats around their rent streams), is made not with a focus on the customer, but rather with a focus on the business being regulated.

The psychological-causal story of how most regulation comes to be is not that the voter reflects on how to help the customer make good choices, and concludes that it is best to constrain their options. Instead the voter hears about or imagines a situation in which a company takes advantage of someone, and feels outraged. There’s a feeling of “that shouldn’t be allowed”, and that the government should stop people from doing things that shouldn’t be allowed.

Not much thought is given to the consideration that you might just inform people to make better choices. That doesn’t satisfy the sense of outrage at a powerful party taking advantage of a weaker party. The focus of attention is not on helping the party being taken advantage of, but on venting the outrage.

What You See Is All There Is, and the question of “what costs does this impose on other people in the system, who might or might not be being exploited”, doesn’t arise.

Most regulation (again, aside from the regulation that is simple rent-seeking) is the result of this sort of thing: