My policy on attempting to get people crypreserved

My current policy: If for, whatever reason, I have been allocated decision making power over what to do with a person’s remains, I will, by default,  attempt to get them cryopreserved. But if they expressed a different preference while alive, I would honor that preference.

For instance, if [my partner] was incapacitated right now, and legally dead, and I was responsible for making that decision, I would push to cryopreserve her. This is not a straightforward extrapolation of her preferences, since, currently, she is not opposed in principle, but doesn’t want to spend money that could be allocated for a better altruistic payoff. But she’s also open to being convinced. If, after clear consideration, she prefers not to be cryopreserved, I would respect and act on that preference. But if I needed to make a decision right now, without the possibility of any further discussion, I would try to get her cryopresereved.

(Also, I would consider if there was an acausal trade that I could make with her values as I understand them, such that those values as I understand them would benefit from the situation, attempting to simulate how the conversation that we didn’t have would have gone. But I don’t commit to fully executing her values as I currently understand them. In places of ambiguity, I would err on the side of what I think is good to do, from my own perspective. That said, after having written the previous sentence. I think it is wrong, in that it doesn’t pass the golden rule for what I would hope she would do if our positions on reversed, which suggests that on general principles, when I am deciding on behalf of a person, I should attempt to execute their values as faithfully as I can (modulo, my own clearly stated ethical injunctions), and if I want something else, to attempt to acausally compensate their values for the trade…That does seem like the obviously correct thing to do.

Ok. I now think that that’s what I would do in this situation: cryopreserve my partner, in part on behalf of my own desire that she live and in part on behalf of the possibility that she would want to be cryopreserved on further reflection, had she had the opportunity for further reflection. And insofar as I am acting on behalf of my own desire that she live, I would attempt to make some kind of trade with her values such that the fraction of probability in which she would have concluded that this is not what she wants, had she had more time to reflect, is appropriately compensated, somehow.

That is a little bit tricky, because most of my budget is already eaten up by optimization for the cosmic altruistic good, so I’m not sure what what I would have to trade, that I counter-factually would not have given anyway. And the fact that I’m in this situation, suggests that I actually do need more of a slack budget that isn’t committed to the cosmic altruistic good, so that I have a budget to trade with. But it seems like something weird has happened if considering how to better satisfy my partner’s values has resulted in my generically spending less of my resources on what my partner values, as a policy. So it seems like something is wonky here.)

Same policy with my family: If my dad or sister was incapacitated, soon to be legally dead, I would push to cryopreserve him/her. But if they had seriously considered the idea and decided against, for any reason, including reasons that I think are stupid, I would respect, and execute, their wishes. [For what it is worth, my dad doesn’t have “beliefs” exactly, so much as postures, but last time he mentioned cryonics, he said something like “I’m into cryonics”/“I think it makes sense.“]

This policy is in part because I guess that cryonics is the right choice and in part because this option preserves optionality in a way that the other doesn’t. If a person is undecided, or hasn’t thought about it much, I want to pick the reversible option for them.

[Indeed, this is mostly why I am signed up myself. I suspect that the philosophy of the future won’t put much value on personal identity. But also, it seems crazy to permanently lock in a choice on the basis of philosophical speculations, produced with my monkey brain, in a confused pre-intelligence-explosion civilization.]

Separately, if a person expressed a wish to be cryopreserved, including casually in conversation (eg “yeah, I think cryonics makes sense”), but hadn’t filled out the goddamn forms, I’ll spend some budget of heroics on trying to get them cryopreserved

I have now been in that situation twice in my life. :angry: Sign up for cryonics, people! Don’t make me do a bunch of stressful coordination and schlepping, to get you a much worse outcome, than if you had just done the paperwork.I do not think it is ok to push for cryopreservation unless one of these conditions (I have been given some authority to decide or the person specifically requested cryo) obtains. I think it is not ok to randomly seize control of what happens to a person’s remains, counter to their wishes.

Disembodied materialists and embodied hippies?

An observation:

Philosophical materialists (for instance, Yudkowskian rationalists) are often rather disembodied. In contrast, hippies, who express a (sometimes vague) philosophy of non-material being, are usually very embodied.

On the face of it, this seems backwards. If materialists were living their philosophy in practice, it seems like they would be doing something different. This isn’t merely a matter of preference or aesthetics; I think that materialists often mis-predict reality on this dimension. I’ve several times heard an atheist materialist express surprise that, after losing weight or getting in shape, their mood or their ability to think is different. Usually, they would not have verbally endorsed the proposition that one’s body doesn’t impact one’s cognition, but nevertheless, the experience is a surprise for them, as if their implicit model of reality is one of dualism. [an example: Penn Jillette expressing this sentiment following his weight loss]

Ironically, we materialists tend to have an intuitive view of ourselves as disembodied minds inhabiting a body, as opposed to the (more correct) view that flows from our abstract philosophy, that our mind is a body, and if you change my body in various ways, would change me. And hippies, ironically, seem much less likely to make that sort of error.

Why is this?

One possibility is that the causality mostly goes in the other direction: the reason why a person is a materialist is due to a powerfully developed capacity for abstract thought, which is downstream of disembodiment.

The default perspective for a human is dualism, and you reach another conclusion

When is an event surprising enough that I should be confused?

Today, I was reading Mistakes with Conservation of Expected Evidence. For some reason, I was under the impression that the post was written by Rohin Shah; but it turns out it was written by Abram Demski.

In retrospect, I should have been surprised that “Rohin” kept talking about what Eliezer says in the Sequences. I wouldn’t have guessed that Rohin was that “culturally rationalist” or that he would be that interested in what Eliezer said in the sequences. And indeed, I was updating that Rohin was more of a rationalist, with more rationalist interests, than I had thought. If I had been more surprised, I could have noticed my surprise / confusion, and made a better prediction.

But on the other hand, was my surprise so extreme that it should have triggered an error message (confusion), instead of merely an update? Maybe this was just fine reasoning after all?

From a Bayesian perspective, I should have observed this evidence, and increased my credence in both Rohin being more rationalist-y than I thought, and also in the hypothesis that this wasn’t written by Rohin. But practically, I would have needed to generate the second hypothesis, and I don’t think that I had strong enough reason to.

I feel like there’s a semi-interesting epistemic puzzle here. What’s the threshold for a surprising enough observation that you should be confused (much less notice your confusion)?

First conclusions from reflections on my life

I spent some time over the past weekend reflecting on my life, over the past few years. 

I turned 28 a few months ago. The idea was that I wanted to reflect on what I’ve learned from my 20s, but do it a few years early, so that I can start doing better sooner rather than later. (In general, I think doing post mortems before a project has ended is underutilized.) 

I spent far less time than I had hoped. I budgeted 2 and a half days free from external distractions, but ended up largely overwhelmed by a major internal distraction. I might or might not do more thinking here, but for starters, here are some of my conclusions.

For now I’m going to focus on my specific regrets: things that I wish I had done differently, because I would be in a better place now if I had done them. There are plenty of things that I was wrong about, or mistakes that I made, which I don’t have a sense of disappointment in my heart about, because those mistakes were the sort of thing that either did or could have help propel me forward. But of the things I list here, all of these held me back. I am worse today than I might have been, in a very tangible-to-me way, because of these errors.

I wish that I had made more things

I wish that, when I look over my life from this vantage point, that it was “fuller”, that I could see more things that I accomplished, more specific value that my efforts produced.

I spent huge swaths of time thinking about a bunch of different things over the years, or planning / taking steps on various projects, but they rarely reached fruition. Like most people, I think, My history is littered with places where I started putting effort into something, but now have nothing to show for it.

This seems like a huge waste. 

I was looking through some really rough blog posts that I wrote in 2019 (for instance, this one, which rather than being any refined theory, is closer to a post mortem on a particular afternoon of work). And to my surprise, they were concretely helpful to me, more helpful to me than any blog post that I’ve read by someone else in a while. Past-Eli actually did figure out some stuff, but somehow, I had forgotten it.

I spend a lot of time thinking, but I think that if I don’t produce some kind of artifact, the thinking that I do is not just not shared with the world, but is lost to me. Creating something isn’t an extra step, its the crystallization of the cognitive work itself. If I don’t create an artifact, the cognitive work is transient, it leaves no impression on me or the world. It might as well not have happened.

And aside from that, I would feel better about my life now, if instead of a bunch of things that I thought about, there were a bunch of blog posts that I published, even if they were in shitty draft form. To the extent that I can look over my life and see a bunch (small, bad) things that I did, I feel better about my life.

I would feel much better if every place where I had had a cool idea, I had made something, and I could look over them all, and see what I had done.

Going forward, I’m going to hold to the policy that every project should have a deliverable, even if it is something very simple: a shitty blogpost, a google doc a test session, an explanation of what I learned (recorded, and posted on youtube), an MVP app.

And in support of this, I also want to have a policy that as soon as I feel like I have something that I could write up, I do that immediately, instead of adding it to my todo list. Often, I’ll do some thinking about something, and have the sketch of how to write it up in my head, but there’s a little bit of activation energy required to sit down and do it, and I have a bunch of things on my plate (including other essays that I want to write). But then I’ll wait too long, and by the time I come back to it, it doesn’t feel alive anymore.

This is what happened with some recent thinking that I did about ELK for instance. I did clarify some things for myself, and intended to write it up, but by the time I went to do that, it felt stale. And so an ELK weekend, that I participated in a while back back is one more project where I had new thoughts but mostly nothing will come of them.

For this reason, I’m pushing myself to write up this document, right now. I want to create some crystallization of the meager thinking that I did when reflecting on my life, that puts a stake in the ground so that I don’t realize some things, and then just forget about them.

I wish that I made a point to write down the arguments that I was steering by

From 2018 to early 2020, I did not pursue a project that seemed to me like the obvious thing for me to be doing, because of combination of constraints involving considerations of info security, some philosophy of uncertainty problems, and underlying both of those, some ego-attachments. I was instead sort of in a holding pattern: hoping/planning to go forward with something, but not actually taking action on it.

[I don’t want to undersell the ego stuff as my just being unvirtuous. I think it was tracking some thing that were in fact bad, and if I had had sufficient skill, I could have untangled it, and had courage and agency. But I can’t think of what straightforward policy would have allowed me to do that, given the social context that I was in.]

In retrospect the arguments that I was steering my life by were…just not very good. I think if I had made a point to write them up, to clarify what I was doing, and why I was doing it, this would have caused me to notice that they didn’t really hold up. 

If for no other reason than that I would share my google docs, and people would argue against my points.

And in any case, I had the intention at the time of orienting to those arguments and trying to do original applied philosophy to find solutions, or at least better framings, for those problems. And I did this a tensy weency bit, but I didn’t make solid progress. And I think that I could have. And the main thing that I needed to do was actually write up what I was thinking so I could build on it (and secondarily, so other people could comment on it). 

(I’m in particular thinking about some ideas I had in conversation with Scott G at an MSFP. There was a blog-post writing day during that workshop, and I contemplated writing it up (I think actually had a vague intention to write it up sometime), but didn’t because I was tired or something.)

And I think this as been pretty generically true. A lot of my sense of what’s important or how things work seems to have drifted along a seemingly random walk, instead of being a series of specific updates for reasons. 

After I panic-bought, during covid, I made a policy that I don’t move money without at least writing up a one-pager explaining what I’m doing and what my reason for doing it is. This allows me to notice if my reason is stupid (“I just saw some news articles and now I’m panicked”) and it allows me to reflect on my actual thought process, not just the result of my thought process later. (Come to think of it, I think it might be true that my most costly financial decision every might be the only other time that I didn’t follow this policy! I should double check that!)

I think I should have a similar policy here. Any argument or consideration that I’m steering my life by, I should write up as as a google doc, with good argumentative structure. 

The thing that I need, to implement this policy is a the trigger. What would cause me to notice arguments that I’m steering my life by.

I wish I had recorded myself more

[inspired by this tweet]

When I was younger, it was important to me to meet my wife early, so that she could have known me when I was young, to understand what I was like and where I grew from. 

I’ve recently started dating someone, and I wish she was able to know what younger Eli was like. She can read my writing, but the essays and diary entries that I wrote are low bandwidth for getting a sense of a person. 

If I had made a vlog or something, I would have lots and lots of video to watch which would help her get a sense of what I was like.

Similarly, for if I ever have kids, I would like them to be able to know what I was like at their age. 

Furthermore, I spent some time over the past day listening to audio recordings that I made over the last decade. I was shocked by the samples of the way my younger self was, and I wish that I had more of those recorded to compare against.

I feel like I’ve sort of permanently missed the boat on this one. I’ve permanently lost access to some information that I wish I had. But I have a heuristic on a much smaller scale: if I’m in a conversation, and I have the thought “I wish I had been recording this conversation”, I start recording right then. It seems like this same heuristic should apply at the macro scale: if I have the thought “I wish I had been regularly recording myself 10 years ago, I should start doing that now.

I wish that I did more things with discrete time boxes, so that I could notice that I failed

There were very few places where I concretely failed at something, and drew attention to the fact that I failed. As noted, there were lots and lots of projects that never reached fruition, but mostly I just punted on those, intending to continue them. If I had a bad day, I was often afraid to cut my losses and just not do the thing that I had hoped to do.

There are lots of skills that I planned to learn, and then I would attempt (usually in an unfocused way) to learn them in some amount of time, and at the end of that period of time I would not have made much progress. But I would implicitly move out my timeline for learning those things; my failing to make progress did not cause me to give up or allow me to consider not making that skill a part of me at some point. I allowed myself to keep punting my plans to the indefinite future.

This was probably self-fulfilling. Since I knew that if I failed to do or learn something in the short term, I wouldn’t actually count that as a failure in any meaningful sense, I would still be planning to get it somehow, I wasn’t really incentivized to do or learn the thing in that short term.

I think that one thing that would have helped me was planning to do things on specific time horizons (this weekend, this week, this month, whatever), and scheduling a post mortem, ideally with another person, on my calendar, at the end of that time horizon.

Now, I don’t think that this would have worked, directly, I think I still would have squandered that time, or made much slower progress than I hoped. But I by having a crisp demarcation of when I wanted to have a project completed by, scheduled in such a way that I can’t just explain it away as no longer relevant (because I made less progress than I had hoped to make by the time it came around), I would more concretely notice and orient to the fact that something that I had tried to do hadn’t worked. And then I could iterate from there.

I intend to do this going forward. Which concretely means that I should look over my current projects, and timebox out at least one of them, and schedule with someone to postmortem with me.

I should have focused on learning by doing

Most of what I have tried to do over the past decade is acquire skills.

This has not been wholly unsuccessful. I do in fact now posses a number of specific skills that most people don’t, and I have gone from broadly incompetent (but undaunted) to broadly competent, in general.

But most of the specific skill learning that I tried to do seems to have been close to fruitless. Much of what I learned I learned in the process of just working on direct projects. (Though not all of it! I’ve recently noticed how much of my emotional communication and facilitation skills are downstream of doing a lot of Circling, and, I guess, from doing SAS in particular).

 I think that I would have done much better to focus less on building skills and to focus more on just doing concrete things that seemed cool to me. 

(And indeed, I knew this at the time, but didn’t act on it, because of reasons related “choosing projects felt like choosing my identity”, and a maybe a general thing of not taking my obvious known mistakes seriously enough, and maybe something else.

I’m going to have a firm rule for the next six months: I’m allowing myself to still try to acquire skills, but this always to be in the context of a project in which I am building something: 

Paternalism is about outrage

I’m listening to the Minds Almost Meeting podcast episode on Paternalism.

I think Robin is missing or misemphasizing something that is central to the puzzle that he’s investigating. Namely, I think most regulation (or most regulation that is not rooted in special interest groups creating moats around their rent streams), is made not with a focus on the customer, but rather with a focus on the business being regulated.

The psychological-causal story of how most regulation comes to be is not that the voter reflects on how to help the customer make good choices, and concludes that it is best to constrain their options. Instead the voter hears about or imagines a situation in which a company takes advantage of someone, and feels outraged. There’s a feeling of “that shouldn’t be allowed”, and that the government should stop people from doing things that shouldn’t be allowed.

Not much thought is given to the consideration that you might just inform people to make better choices. That doesn’t satisfy the sense of outrage at a powerful party taking advantage of a weaker party. The focus of attention is not on helping the party being taken advantage of, but on venting the outrage.

What You See Is All There Is, and the question of “what costs does this impose on other people in the system, who might or might not be being exploited”, doesn’t arise.

Most regulation (again, aside from the regulation that is simple rent-seeking) is the result of this sort of thing:

Thinking about how to orient to a hostile information environment, when you don’t have the skills or the inclination to become an epistemology nerd

Successfully propagandized people don’t think they’ve been propagandized; if you would expect to feel the same way in either case, you have to distinguish between the two possibilities using something other than your feelings.

Duncan Sabien

I wish my dad understood this point.

But it’s pretty emotionally stressful to live in a world where you can’t trust your info streams and you can’t really have a grasp on what’s going on.

Like, if I tell my dad not to trust the New York times, because it will regularly misinform him, and that “science” as in “trust the science” is a fake buzzword, about as likely to be rooted in actual scientific epistemology as not, he has few reactions. But one of them is “What do you want me to do? Become a rationalist?”

And he has a point. He’s just not going to read covid preprints himself, to piece together what’s going on. That would take hours and hours of time that he doesn’t want to spend, it would be hard and annoying and it isn’t like he would have calibrated Bayesian takes at the end.

(To be clear, I didn’t do that with covid either, but I could do it, at least somewhat, if I needed to, and I did do little pieces of it, which puts me on a firmer footing in knowing which epistemic processes to trust.)

Give that he’s not going to do that, and I don’t really think that he should do that, what should he do?

One answer is “just downgrade your confidence in everything. Have a blanket sense of ‘actually, I don’t really know what’s going on.’ ” A fundamental rationalist skill is not making stuff up, and saying “I don’t know.” I did spend a few hours tying to orient on the Ukraine situation, and forcing myself to get all the way to the point of making some quantiative predictions (so that I have the opportunity to be surprised, and notice that I am surprised). But my fundamental stance is “I don’t understand what’s going on, and I know that I don’t understand. (Also here are some specific things that I don’t know.)”

…Ok. Maybe that is feasible. It’s pretty hard to live in a world where you fundamentally don’t know what’s happening, where people assume you have some tribal opinion about stuff and your answer is “I don’t know, I think my views are basically informed by propaganda, and I’m not skilled enough or invested enough to try to do better, so I’m going to not believe or promote my takes.”

But maybe this becomes easier if the goal of your orientation in the world is less to have a take on what’s going on, but is instead to prioritize uncertainties: to figure out which questions seem most relevant for understanding, so that you have _some_ map to orient from, even if it is mostly just a map of your uncertainty.

Some un-edited writeups of conversations that I’ve had with Ben Hoffman, and Jessica Taylor, and Michael Vassar

Extended Paraphrase of Ben and Jessica’s general view [December 2019]

Eli’s Summary of a Conversation with Vassar [October 2020]

Overview of my conversation with Vassar on Wednesday Feb 10: Trauma patterns [February 2021]

Eli’s notes from Spencer’s interview with Vasser [February 2021]

Parsing Ben and Michael’s key points from that giant twitter conversation [May 2021]

Two interlocking control systems

When I was practicing touch typing I found that much of the skill was a matter of going as fast as I could, without letting my speed outpace my accuracy. If I could feel that the precision of my finger placements was high, I would put more “oomph” into my typing, pushing harder to go faster. 

But I would often fall into an attractor of “rushing” or “going off the rails”, where I was pushing to go fast in a way that caused my accuracy to fall apart, and I started to “trip over myself”. I made a point to notice this starting to happen and then intentionally slow down (and relax my shoulders) to focus on the precision of my finger placements. The goal was never to rush (because that is counter productive), but to go as fast as possible within that constraint.

I think there might be an analogous thing in my personal productivity. 

When I have a largish amount to get done in a short amount of time, this can be energizing and motivating. My physiological arousal is higher. The my personal tempo faster. There’s a kind of energy or motivation that comes from having things that need to get done, with deadlines, and it boots me up into a higher energy orbital, where my default mental actions are geared towards making progress, instead of random “I don’t feel like it” sort of laziness. There’s a bit of a tailwind behind me.

(Indeed, this kind of pressure is exactly what was missing for most of 2020.)

However, sometimes this pressure gets overwhelming, and my intentionality collapses. It’s too much. Either I don’t have the spaciousness to let my attention fully engage with any given task (which is usually necessary for making progress) because of the competing goal threads, instead only managing a shallow superficial attention, or I’ll get overwhelmed and opt out of all of it by distracting myself.

There’s this important principle that I never want my tailwind to outpace my structure. Having some amount of pressure speeding me along is great, but only if my intentionality is high enough to still absorb everything that’s coming at me, taking in the input of what’s important, orienting to it, and taking action on it.

Too much tail wind and that intentionality collapses.

Which means that I need a control system that keeps those two metrics in sync. I need to notice when my intentionality is starting to collapse, and take actions to slow things down and to shore up my intentionality. 

However, my intentionality can collapse for another reason, other than getting outpaced by motivation-pressure. It also collapses when I’m low on energy and alertness.

My intentionality depends on my personal energy and alertness. When my energy and alertness is depleted, the inner structure of my intentionality tends to collapse. 

(There are some caveats here. For one thing, it is possible to maintain intentionality in a low energy state. Also, I can sometimes depend on external structure as a substitute for intentionality, and external structure depends much less on my personal energy and alertness. But to a first approximation, low energy -> low intentionality.)

As a consequence of this, the control system maintaining my intentionality propagates back to an earlier control system maintaining my energy level. I want to notice when my energy is flagging, when I’m just starting to run on fumes, and take action to shore up my energy, before my intentionality collapses.

Furthermore, because my personal energy and alertness is at the bottom of the stack, a lot of my energy and alertness maintenance is not structured as a control system. I employ strategies to get good sleep, and to exercise every day, independently of my current energy level, because high energy is self sustaining.

Having some practices that are “foundational” rather than implemented as control systems is costlier, because it means that I’ll sometimes engage in them when they are not strongly necessary. But foundational systems are more robust: they have more slack in the system to absorb peterbations.

Aversions inhibit slow focus

I’ve written elsewhere about how the biggest factor in my personal productivity is aversions, and skillfully engaging with aversions. It’s maybe not unsurprising that having an aversion to task is relevant to effectively executing on that task. But it is a bit more surprising that having an aversion to some task or consideration, makes it much much less likely that I’ll effectively execute on anything.

The key insight, I think, is engaging deeply in a task entails clearing some mental space.

Aversion to something increases my compulsiveness / distractibility. I’m more likely to take a bathroom break, or to make food for myself, or to rereard old blog posts on my phone (without jotting down my thoughts in the way that makes reading more productive / creative), or to go check twitter and then get stuck in the twitter loop.[1] 

I think this is because I’m feeling some small constant pain, and part of me is compulsively seeking positive stimulation to distract from the pain. Basically holding an aversion makes me more reactive to stray thoughts and affordances of the environment. My immediate actions are driven by a (subtle, but nevertheless dominating) clawing, grasping, drive for positive sensation, instead of flowing from “my deep values”, my sense of what seems cool or alive. 

Most, but not all, forms of creative work, involve making mental space, quieting those distractions so that I can give my full attention to the thing that I’m trying to do. The reason why aversions kill my productivity is that my compulsive stimulation-hunger is too graspy to settle down into any long-threaded thought. That part of me doesn’t want to be still, because it is seeking distraction from the sensation in me.

(The exception is some forms of work that “fit” this compulsiveness, where I can get sucked into compulsively doing some task as a way to distract from the sensation in my body. Sometimes an essay is of the right shape that it can be a hook in just the right way, but most of my work is not like this.)

Generally, when I notice an aversion, I’ll engage with it directly, either by sitting down and meditating, feeling into the sensation in a non semantic way, or by doing focusing / journaling, which is more of a semantic “dialogue”, or something that is a mix of both approaches.

In doing this, I’m first just trying to make space for the sensation, to feel it without distraction, while also being welcoming towards the part of me that is doing the dissociation, and secondly hoping to get more understanding and context, so that I can start planning and taking action regarding the underlying concern of the aversion.

[1] I found myself doing all of these except the last one today, all the while vaguely / liminally aware of the agitation clench in my belly, before I sat down to engage with it directly.

How do we prepare for final crunch time? – Some initial thoughts

[epistemic status: Brainstorming and first draft thoughts.

Inspired by something that Ruby Bloom wrote and the Paul Christiano episode of the 80,000 hours podcast.]

One claim I sometimes hear about AI alignment [paraphrase]:

“It is really hard to know what sorts of AI alignment work are good, this far out from transformative AI. As we get closer, we’ll have a clearer sense of what AGI / Transformative AI is likely to actually look like, and we’ll have much better traction on what kind of alignment work to do. In fact, it might be the case that MOST of the work of AI alignment is done in the final few years before AGI, when we’ve solved most of the hard capabilities problems already and we can work directly, with good feedback loops, on the sorts of systems that we want to align.”

Usually this is taken to mean that the alignment research that is being done today is primarily to enable or make easier future, more critical, alignment work. But “progress in the field” is only one dimension to consider in boosting the work of alignment researchers in final crunch time.

In this post I want to take the above posit seriously, and consider the implications. If most of the alignment work that will be done is going to be done in the final few years before the deadline, our job in 2021 is mostly to do everything that we can to enable the people working on the problem in the crucial period (which might be us, or our successors, or both) so that they are as well equipped as we can possibly make them.

What are all the ways that we can think of that we can prepare now, for our eventual final exam? What should we be investing in, to improve our efficacy in those final, crucial, years?

The following are some ideas.


For this to matter, our alignment researchers need to be at the cutting edge of AI capabilities, and they need to be positioned such that their work can actually be incorporated into AI systems as they are deployed.

A different kind of work

Most current AI alignment work is pretty abstract and theoretical, for two reasons. 

The first reason is a philosophical / methodological claim: There’s a fundamental “nearest unblocked strategy” / overfitting problem. Patches that correct clear and obvious alignment failures are unlikely to generalize fully, you’ll only have constrained unaligned optimization to channels that you can’t recognize. For this reason, some claim, we need to have an extremely robust, theoretical understanding of intelligence and alignment, ideally at the level of proofs.

The second reason is a practical consideration: we just don’t have powerful AI systems to work with, so there isn’t much in the way of tinkering and getting feedback.

The second objection becomes less relevant in final crunch time: in this scenario, we’ll have powerful systems 1) that will be built along the same lines as the systems that it is crucial to align and 2)  that will have enough intellectual capability to pose at least semi-realistic “creative” alignment failures (ie, current systems are so dumb, and liven in such constrained environments, that it isn’t clear how much we can learn about aligning literal superintelligences from them.)

And even if the first objection ultimately holds, theoretical understanding often (usually?) follows from practical engineering proficiency. It seems like it might be a fruitful path to tinker with semi-powerful systems trying out different alignment approaches empirically, and tinkering to discover new approaches, and then backing up to do robust theory-building given much richer data about what seems to work.

I could imagine sophisticated setups that enable this kind of tinkering and theory building. For instance, I imagine a setup that includes:

  • A “sandbox” that afford easy implementation of many different AI architectures and custom combinations of architectures, with a wide variety easy-to-create, easy-to-adjust, training schemes, and a full suite of interpretability tools. We could quickly try out different safety schemes, in different distributions, and observe what kinds of cognition and behavior result.
  • A meta AI that observes the sandbox, and all of the experiments therein, to learn general principles of alignment. We could use interpretability tools to use this AI as a “microscope” on the AI alignment problem itself, abstracting out patterns and dynamics that we couldn’t easily have teased out with only our own brains. This meta system might also play some role in designing the experiments to run in the sandbox, to allow it to get the best data to test it’s hypotheses.
  • A theorem prover that would formalize the properties and implications of those general alignment principles, to give us crisply specified alignment criteria by which we can evaluate AI designs.

Obviously, working with a full system like this is quite different than abstract, purely theoretical work on decision theory or logical uncertainty. It is closer to the sort of experiments that the OpenAI and Deep Mind safety teams have published, but even that is a pretty far cry from the kind of rapid-feedback tinkering that I’m pointing at here.

Given that the kind of work that leads to research progress might be very different in final crunch time than it is now, it seems worth trying to forecast what shape that work will take and trying to see if there are ways to practice doing that kind of work before final crunch time.


Obviously, when we get to final crunch time, we don’t want to have to spend any time studying fields that we could have studied in the lead-up years. We want to have already learned all the information and ways of thinking that we’ll want to know, then. It seems worth considering what fields we’ll wish we had known when time comes.

The obvious contenders:

  • Machine Learning
  • Machine Learning interpretability
  • All the Math of Intelligence that humanity has yet amassed [Probability theory Causality, etc.]

Some less obvious possibilities:

  • Neuroscience?
  • Geopolitics, if it turns out that which technical approach is ideal hinges on important facts about the balance of power?
  • Computer security?
  • Mechanism design in general?

Research methodology / Scientific “rationality”

We want the research teams tackling this problem in final crunch time to have the best scientific methodology and the best cognitive tools / habits for making research progress, that we can manage to provide them.

This maybe includes skills or methods in the domains of:

  • Ways to notice as early as possible if you’re following an ultimately-fruitless research path
  • Noticing / Resolving /Avoiding blindspots
  • Effective research teams
  • Original seeing / overcoming theory blindness / hypothesis generation
  • ???


One obvious thing is to spend time now, investing in habits and strategies for effective productivity. It seems senseless to waste precious hours in our acute crunch time due to procrastination or poor sleep. It is well worth in to solve those problems now. But aside from the general suggestion to get your shit in order and develop good habits now I can think of two more specific things that seem good to do.

Practice no-cost-too-large productive periods

There maybe trades that could make people more productive on the margin, but are too expensive in regular life. For instance, I think that I might conceivably benefit from having a dedicated person who’s job is to always be near me, so that I can duck with them with 0 friction. I’ve experimented a little bit with similar ideas (like having a list of people on call to duck with), but it doesn’t seem worth it for me to pay a whole extra person-salary to have the person be on call, and in the same building, instead of on-call via zoom.

But it is worth it at final crunch time.

It might be worth it to spend some period of time, maybe a week, maybe a month, every year, optimizing unrestrainedly for research productivity, with no heed to cost at all, so that we can practice how to do that. This is possibly a good thing to do anyway, because it might uncover trades that actually, on reflection are worth importing into my regular life.

Optimize rest

One particular subset of personal productivity, that jumps out at me: each person should figure out their actual optimal cadence of rest.

There’s a failure mode that ambitious people commonly fall into, which is working past the point when marginal hours of work are negative. When the whole cosmic endowment is on the line, there will be a natural temptation to push yourself to work as hard as you can, and forgo rest. Obviously, this is a mistake. Rest isn’t just a luxury: it is one of the inputs to productive work.

There is a second level of this error in which one, grudgingly, takes the minimal amount of rest time, and gets back to work. But the amount of rest time required to stay functional is not the optimal amount of rest, the amount the maximizes productive output. Eliezer mused years ago, that he felt kind of guilty about it, but maybe he should actually take two days off between research days, because the quality of his research seemed better on days when he happened to have had two rest days preceding.

In final crunch time, we want everyone to be resting the optimal amount that actually maximizes area under the curve, not the one that maximizes work-hours. We should do binary search now, to figure out what the optimum is.

Also, obviously, we should explore to discover highly effective methods of rest, instead of doing whatever random things seem good (unless, as it turns out, “whatever random thing seems good” is actually the best way to rest).

Picking up new tools

One thing that will be happening in this time, is there will be a flurry of new AI tools that can radically transform thinking and research, perhaps increasingly radical tools coming at a rate of once a month or faster.

Being able to take advantage of those tools and start using them for research immediately, with minimal learning curve, seems extremely high leverage.

If there are things that we can do that increase the ease of picking up new tools and using them to their full potential (instead of, as is common, using only the features afforded by your old tools and only very gradually

Some thoughts (probably bad):

  • Could we set up our workflows, somehow, such that it is easy to integrate new tools into them? Like if you already have a flexible, expressive research interface (something like Roam?), and you’re used to regular changes in capability to the backed of the interface?
  • Can we just practice? Can we have a competitive game of introducing new tools, and trying to orient to them and figure out how to exploit them creatively as possible?
  • Probably it should be some people’s full time job to translate cutting edge developments in AI into useful tools and practical workflows, and then to teach those workflows to the researchers?
  • Can we design a meta-tool that helps us figure out how to exploit new tools? Is it possible to train an AI assistant specifically for helping us get the most out of our new AI tools?
  • Can we map out the sort of constraints on human thinking and/or the the sorts of tools that will be possible, in advance, so that we can practice with much weaker versions of those tools, and get a sense of how we would use them, so that we’re ready when they arrive?
  • Can we try out new tools on psychedelics, to boost neuroplasticity? Is there some other way to temporarily weaken our neural priors? Maybe some kind of training in original seeing?

Staying grounded and stable in spite of the stakes

Obviously, being one of the few hundred people on whom the whole future of the cosmos rests, while the singularity is happening around you, and you are confronted with the stark reality of how doomed we are, is scary and disorienting and destabilizing.

I imagine that that induces all kinds of psychological pressures, that might find release in any of a number of concerning outlets: by deluding one’s self about the situation, by becoming manic and frenetic, by sinking into immovable depression.

We need our people to have the virtue of being able to look the problem in the eye, with all of its terror and disorientation, and stay stable enough to make tough calls, and make them sanely.

We’re called to cultivate a virtue (or maybe a set of virtues) of which I don’t know the true name, but which involve courage and groundless, and determination-without-denial.

I don’t know what is entailed in cultivating that virtue. Perhaps meditation? Maybe testing one’s self at literal risk to one’s life? I would guess that people in other times and places, who needed to face risk to their own lives and that of their families, did have this virtue, or some part of it, and it might be fruitful to investigate those cultures and how that virtue was cultivated.