Reflecting on some regret about not trying to join and improve specific org(s)

I started a new job recently, which has prompted me to reflect on my work over the past few years, and how I could have done better.

Concretely, I regret not joining SERI MATS, and helping it succeed, when it was first getting started. 

I think this might have been a great fit for me: I had existing skills and experience that I think would have been helpful for them. The seasonal on-off schedule would have given me the flexibility to do and learn other things. It would have (I think) helped me get a better grounding in Machine Learning and technical alignment approaches.

And if I had joined with an eye towards agentically shaping the organization’s culture and priorities as it developed, I think I would have had a positive impact on the seed that has grown into the current alignment field . In particular, I think I might have had leverage to establish some cultural norms regarding how to think about the positive and negative impacts of one’s work.1 

I regarded MATS as the obvious thing to do. The nascent alignment field was bottlenecked on mentorship—a small number of people (arguably) had good taste for the kinds of research that was on track, but had limited bandwidth for research mentorship, so conveying that research taste was (and is?) a bottleneck for the whole ecosystem. A program aiming to unblock everything else to expand the capacity for research mentorship as much as possible seemed like the obvious straightforward thing to do.

I said as much in my post from early 2023:

There is now explicit infrastructure to teach and mentor these new people though, and that seems great. It had seemed for a while that the bottleneck for people coming to do good safety research was mentorship from people that already have some amount of traction on the problem. Someone noticed this and set up a system to make it as easy as possible for experienced alignment researchers to mentor as many junior researchers as they want to, without needing to do a bunch of assessment of candidates or to deal with logistics. Given the state of the world, this seems like an obvious thing to do.

I don’t know that this will actually work (especially if most of the existing researchers are themselves doing work that dodges the core problem), but it is absolutely the thing to try for making more excellent alignment researchers doing real work. And it might turn out that this is just a scalable way to build a healthy field.

In retrospect, I should have written those paragraphs and generated the next thought “I should actively go try to get involved in SERI MATS and see if I can help them.”

So why didn’t I?

Misapplied notion of counterfactual impact

I didn’t do this because I was operating on the model/assumption that, while this was important, they were doing it now, and were probably not in danger of failing at it. It was taken care of and so I didn’t need to do it.

I now think that was probably a mistake. Because I didn’t get involved, I don’t know one way or the other, but it seems plausible to me that I could have contributed to making the overall project substantially better: more effective and with better positive externalities. 

This isn’t because I’ve learned anything in particular about how SERI MATS missed the mark, but just getting more exposure to organizations and adjusting my prior that even if an organization is broadly working, and not in danger of collapse, it might be the case that I can personally make it much better with my efforts. In particular, I think it will sometimes be the case that there is room to substantially improve an organization in ways that don’t line up very neatly with the specific roles that they’re attempting to explicitly hire for, if you have strategic orientation and specific relevant experience.2

This realization is downstream with my interactions with Palisade over recent weeks. Also, Ronny made a comment a few years ago (paraphrased) that “you shouldn’t work for an organization unless you’re at least a little bit trying to reform it”. That stuck with me, and changed my concept of “working for an org”.

Possibly this difference in frame is also partially downstream of thinking a bit about shapley values through reading Planecrash and thinking about donation-matching for SFC. (I previously aimed to do things that, if I didn’t do them, wouldn’t happen. Now, I’ve continuous-ized that notion, and aim for, approximately, high shapley value).

Underestimating the value of “having a job”

Also, regarding SERI potentially being a good fit for me in particular, I think I have historically underestimated the value of having a job for structuring one’s life and supporting personal learning. I currently wish that I had more technical background in ML and alignment/control work, and I think I might have gotten more of that if I had been actively trying to develop in that direction while supporting MATS in a non-technical capacity, instead of trying to develop that background (inconsistently) independently.

Strategic misgivings

I didn’t invest heavily in any project over recent years because there wasn’t much that I straightforwardly believed in. As noted above, the idea-of-MATS was a possible exception to this—it seemed like the obvious thing to do given the constraints of the world. And I now think I should take “this seems like the obvious thing to do” as a much stronger indicator that I should get involved with a project, somehow, and figure out how to help, than I previously did.

But part of what held me back from doing that was misgivings about the degree to which MATS was acting as a feeder pool for the scaling labs. MATS is another project that doesn’t seem obviously robustly good to me (or “net-positive”, though I kind of think that’s the wrong frame). As with many projects, I felt reticent to put my full force behind it for that reason.

In retrospect, I think maybe I should have shown up and tried to solve the problem of “it seems like we’re doing plausible real harm, and that seems unethical” from the inside. I could have repeatedly and vocally drawn attention to it, raised it as a consideration in strategic and tactical planning, etc. Either I would have shaped the culture around this problem for the MATS staff sufficiently that I trusted the overall organism to optimize safely, or we would have bounced off of each other unproductively. And in that second case, we could part ways, and I could move on.

In general, it feels like a more obvious affordance to me, now, if I think something is promising, but I don’t trust it to have positive impacts, I just try non-disruptively making it better according to the standards that I think are important, and if that doesn’t work or doesn’t go well, parting ways with the org.

This all begs the question, “should I still try to work for SERI MATS and make it much better?”

My guess is that the opportunity is smaller now than it was a few years ago, because both the culture and processes of the org have more found an equilibrium that works. There’s less leverage to make an org much better when the org is figuring out how to do the thing it’s trying to do, compared to when it has reached product-market-fit, and is mostly finding ways to reproduce that product consistently and reliably.

That said, one common class of error is overestimating the degree to which an opportunity has passed. e.g. not buying Bitcoin in 2017, because you believe that you’ve already missed the big opportunity—it’s true in some sense, but you’re underestimating how much of the opportunity still remains. 

So, if I were still unattached, writing this essay would prompt me to reach out to Ryan, and say directly that I’m interested in exploring working for MATS, and try to get more contact with the territory, so that I can see for myself. As it is, I have a job which seems like it needs me more, and which I anticipate absorbing my attention for at least the next year.

  1. Note: of all the things I wrote here, this is the point that I am most uncertain of. It seems plausible to me that because of psychological dynamics akin to “It is difficult to get a man to understand something, when his salary depends on his not understanding it”, and classic EA-style psychological commitment to life narratives that impart meaning via impact, the cultural norms around how the ecosystem as a whole thinks about positive and negative impacts, were and are basically immovable. Or rather, I might have been able to make more-or-less performative hand-wringing fashionable, and possibly cause people to have less of an action-bias , but not actually produce norms that lead to more robustly positive outcomes.

    At least, I don’t have a handle on either how to approach these questions myself, or how to effectively intervene on the culture about them. And so I’m not clear on if I could have made things better in this way. But I could have made this my explicit goal and tried, and made some progress, or not. ↩︎
  2. A bit of context that is maybe important. I have not, applied for a job since I was 21, and was looking for an interim job during college. Every single job that I’ve gotten in my adult life has resulted from either, my just showing up and figuring out how I could be helpful, or someone I already know reaching out to me and asking me for help with a project.

    For me at least, “show up and figure out what is needed and make that happen” is a pretty straightforward pattern of action, but it might be foreign to other people who have a different conception of jobs that is more centered on specific roles, that you’re well-suited for, and doing a good job in those roles. ↩︎

Consideration Factoring: a relative of Double Crux

[Epistemic status: work in progress, at least insofar as I haven’t really nailed down the type-signature of “factors” in the general case. Nevertheless, I do this or something like this pretty frequently and it works for me. There are probably a bunch of prerequisites, only some of which I’m tracking, though.]

This posts describes a framework I sometimes use when navigating (attempting to get to the truth of and resolve) a disagreement with someone. It is clearly related to the Double Crux framework, but is distinct enough, that I think of it as an alternative to Double Crux. (Though in my personal practice, of course, I sometimes move flexibly between frameworks).

I claim no originality. Just like everything in the space of rationality, many people already do this, or something like this.

Articulating the taste that inclines me to use one method in one conversational circumstance and a different method in a different circumstance is tricky. But a main trigger for using this one is when I am in a conversation with someone, and it seems like they keep “jumping all over the place” or switching between different arguments and considerations. Whenever I try to check if a consideration is a crux (or share an alternative model of that consideration), they bring up a different consideration. The conversation jumps around, and we don’t dig into any one thing for very long. Everything feels kind of slippery somehow.

(I want to emphasize that this pattern does not mean the other person is acting in bad faith. Their belief is probably a compressed gestalt of a bunch of different factors, which are probably not well organized by default. So when you make a counter argument to one point, they refer to their implicit model, and the counterpoint you made seems irrelevant or absurd, and they try to express what that counterpoint is missing.)

When something like that is happening, it’s a trigger to get paper (this process absolutely requires externalized, shared, working memory), and start doing consideration factoring.

Step 1: Factor the Considerations

1a: List factors

The first step is basically to (more-or-less) goal factor. You want to elicit from your partner, all of the considerations that motivate their position, and write those down on a piece of paper.

For me, so far, usually this involves formulating the disagreement as an action or a world state, and then asking what are the important consequences of that action or world-state. If your partner thinks that that it is a good idea to invest 100,000 EA dollars in project X, and you disagree, you might factor all of the good consequences that your partner expects from project X.

However, the type signature of your factors is not always “goods.” I don’t yet have a clean formalism that describes what the correct type signature is, in full generality. But it is something like “reasons why Z is important”, or “ways that Z is important”, where the two of you disagree about the importance of Z.

For instance, I had a disagreement with someone about how important / valuable it is that rationality development happen within CFAR, as opposed to some other context: He thought it was all but crucial, or at least throwing away huge swaths of value, while I thought it didn’t matter much one way or the other. More specifically, he said that he thought that CFAR had a number of valuable resources, that it would be very costly for some outside group to accrue.

So together, we made a list of those resources. We came up with:

  1. Ability to attract talent
  2. The ability to propagate content through the rationality and EA communities.
  3. The Alumni network
  4. Funding
  5. Credibility and good reputation in the rationality community.
  6. Credibility and good reputation in the broader world outside of the rationality community.

My scratch paper:

IMG_3024 2 copy(We agreed that #5 was really only relevant insofar as it contributed to #2, so we lumped them together. The check marks are from later in the conversation, after we resolved some factors.)

Here, we have a disagreement which is something like “how replaceable are the resources that CFAR has accrued”, and we factor into the individual resources, each of which we can engage with separately. (Importantly, when I looked at our list, I thought that for each resource, either 1) it isn’t that important, 2) CFAR doesn’t have much of it, or 3) it would not be very hard for a new group to acquire it from scratch.)

1b: Relevance and completeness checks

Importantly, don’t forget to do relevance and completion checks:

  • If all of these considerations but one were “taken care of” to your satisfaction, would you change your mind about the main disagreement? Or is that last factor doing important work, that you don’t want to loose?
  • If all of these consideration were “taken care of” to your satisfaction, would you change your mind about the main disagreement? Or is something missing?

[Notice that the completeness check and relevance check on each factor, together, is isomorphic to a crux-check on the conjunction of all of the factors.]

Step 2: Investigate each of the factors

Next, discuss each of the factors in turn.

2a: Rank the factors

Do a breadth first analysis of which branches seem most interesting to talk about, where interesting is some combination of “how crux-y that factor is to your view”, “how cruxy that factor is for your partner’s view”, and “how much the two of you disagree about that factor.”

You’ll get to everything eventually, but it makes sense to do the most interesting factors first.

The two of you spend a few minutes superficially discussing each one, and assessing which seems most juicy to continue with first.

2b: Discuss each factor in turn

Usually, I’ll take out a new sheet of paper for each factor.

Here you’ll need to be seriously and continuously applying all of the standard Double Crux / convergence TAPs. In particular, you should be repeatedly...

  • Operationalizing to specific cases
  • Paraphrasing what you understand your partner to have said,
  • Crux checking (for yourself), all of their claims, as they make them.

[I know. I know, I haven’t even written up all of these basics, yet. I’m working on it.]

This is where the work is done, and where most of the skill lies. As a general heuristic, I would not share an alternative model or make a counterargument until we’ve agreed on a specific, visualizable story that describes my partner’s point and I can paraphrase that point to my partner’s satisfaction (pass their ITT).

In general, a huge amount of the heavily lifting is done by being ultra specific. You want to be working with very specific stories with clarity about who is doing what, and what the consequences are.  If my partner says “MIRI needs prestige in order to attract top technical talent”, I’ll attempt to translate that into a specific story…

“Ok, so for instance, there’s a 99.9 percentile programmer, let’s call him Bob, who works at Google, and he comes to an AIRCS workshop, and has a good time, and basically agrees that AI safety is important. But he also doesn’t really want to leave his current job, which is comfortable and prestigious, and so he sort of slides off of the whole x-risk thing. But if MIRI were more prestigious, in the way that say, RAND used to be prestigious (most people who read the New York times know about MIRI, and people are impressed when you say you work at MIRI), Bob is much more likely to actually quit his job and go work at on AI alignment at MIRI?”

…and then check if my partner feels like that story has captured what they were trying to say. (Checking is important! Much of the time, my partner wants to correct my story, in some way. I keep offering modified versions it until I give a version that they certify as capturing their view.)

Very often, telling specific stories clears out misconceptions: either correcting my mistaken understanding of what the other person is saying, or helping me to notice places where some model that I’m proposing doesn’t seem realistic in practice. [One could write several posts on just the skillful use of specificity in converge conversations.]

Similarly, you have to be continually maintaining the attitude of trying to change your own mind, not trying to convince your partner.

Sometimes the factoring is recursive: it makes sense to further subdivide consideration, within each factor. (For instance, in the conversation referenced above about rationality development at CFAR, we took the factor of “CFAR has or could easily get credibility outside of the rationality / EA communities” and asked “what does extra-community credibility buy us?” This produced the factors “access to governments agencies, fortune 500 companies, universities and other places of power” and “leverage for raising the sanity waterline.” Then we might talk about how much each of those sub-factors matter.)

(In my experience) your partner will probably still try and jump around between the factors: you’ll be discussing factor 1, and they’ll bring in a consideration from factor 4. Because of this, one of the things you need to be doing is, gently and firmly, keeping the discussion on one factor at a time. Every time my partner seems to try to jump, I’ll suggest that that seems like it is more relevant to [that other factor], than to this one, and check if they agree. (The checking is really important! It’s pretty likely that I’ve misunderstood what they’re saying.) If they agree, then I’ll say something like “cool, so let’s put that to the side for a moment, and just focus on [the factor we’re talking about], for the moment. We’ll get to [the other factor] in a bit.” I might also make a note of the point they were starting to make on the paper of [the other factor]. Often, they’ll try to jump a few more times, and then get the hang of this.

In general, while you should be leading and facilitating the process, every step should be a consensus between the two of you. You suggest a direction to steer the conversation, and check if that direction seems good to your partner. If they don’t feel interested in moving in that direction, or feel like that is leaving something important out, you should be highly receptive to that.

If at any point your partner feels “caught out”, or annoyed that they’ve trapped themselves, you’ve done something wrong. This procedure and mapping things out on paper should feel something like relieving to them, because we can take things one at a time, and we can trust that everything important will be gotten to.

Sometimes, you will semi-accidentally stumble across a Double Crux for your top level disagreement that cuts across your factors. In this case you could switch to using the Double Crux methodology, or stick with Consideration Factoring. In practice, finding a Double Crux means that it becomes much faster to engage with each new factor, because you’ve already done the core untangling work for each one, before you’ve even started on it.

Conclusion

This is just one framework among a few, but I’ve gotten a lot of mileage from it lately.

Using the facilitator to make sure that each person’s point is held

[Epistemic status: This is a strategy that I know works well from my own experience, but also depends on some prereqs.

I guess this is a draft for my Double Crux Facilitation sequence.]

Followup to: Something simple to try in conversations

Related to: Politics is the Mind Killer, Against Disclaimers

Here’s a simple model that is extremely important to making difficult conversations go well:

Sometimes, when a person is participating in a conversation, or an argument, he or she will be holding onto a “point”, that he/she wants to convey.

For instance…

  • A group is deciding which kind of air conditioner to get, and you understand that one brand is much more efficient than the others, for the same price.
  • You’re listening to a discussion between two intellectuals who you can tell are talking past eachother, and you have the perfect metaphor that will clarify things for both of them.
  • Your startup is deciding how to respond to an embarrassing product failure, one of the cofounders wants to release a statement that you think will be off-putting to many of your customers.

As a rule, when a person is “holding onto” a point that they want to make, they are unable to listen well.

The point that a person wants to make relates to something that’s important to them. If it seems that their conversational-partners are not going to understand or incorporate that point, that important value is likely going to be lost. Reasonably, this entails a kind of anxiety.

So, to the extent that it seems to you that your point won’t be heard or incorporated, you’ll agitatedly push for airtime, at the expense of good listening. Which, unfortunately, results in a coordination problem of each person pushing to get their point heard and no one listening. Which, of course, makes it more likely that any given point won’t be heard, triggering a positive feedback loop.

In general, this means that conversations are harder to the degree that…

  1. The topic matters to the participants.
  2. The participant’s visceral expectation is that they won’t be heard.

(Which is a large part of the reason why difficult conversations get harder as the number of participants increases. More people means more points competing to be heard, which exacerbates the death spiral.)

Digression

I think this goes a long way towards explicating why politics is a mind killer. Political discourse is a domain which…

  1. Matters personally to many participants, and
  2. Includes a vast number of “conversational participants”,
  3. Who might take unilateral action, on the basis of whatever arguments they hear, good or bad.

Given that setup, it is quite reasonable to treat arguments as soldiers. When one sees someone supporting, or even appearing to support a policy or ideology that you consider abhorrent or dangerous, there is a natural and reasonable anxiety that the value you’re protecting will be lost. And there is a natural (if usually poorly executed) desire to correct the misconception in the common knowledge before it gets away from you. Or failing that, to tear down the offending argument / discredit the person making it.

(To see an example of the thing that one is viscerally fearing, see the history of Eric Drexler’s promotion of nanotechnology. Drexler made arguments about Nanotech, which he hoped would direct resources in such a way that the future could be made much better. His opponents attacked strawmen of those arguments. The conversation “got away” from Drexler, and the whole audience discounted the ideas he supported, thus preventing any progress towards the potential future that Drexler was hoping to help bring into being.

I think the visceral fear of something like this happening to you is what motivates “treating arguments as soldiers“)

End digression

Given this, one of the main thing that needs to happen to make a conversation go well, is for each participant to (epistemically!) aleive that their point will be gotten to and heard. Otherwise, they can’t be expected to put it aside (even for a moment) in order to listen carefully to their interlocutor (because doing so would increase the risk of their point in fact not being heard).

When I’m mediating conversations, one strategy that I employ to facilitate this is to use my role as the facilitator to “hold” the points of both sides. That is (sometimes before the participants even start talking to each-other), I’ll first have each one (one at a time) convey their point to me. And I don’t go on until I can pass the ITT of that person’s point, to their (and my) satisfaction.

Usually, when I’m able to pass the ITT, there’s a sense of relief from that participant. They now know that I understand their point, so whatever happens in the conversation, it won’t get lost or neglected. Now, they can relax and focus on understanding what the other person has to say.

Of course, with sufficient skill, one of the participants can put aside their point (before it’s been heard by anyone) in order to listen. But that is often asking too much of your interlocutors, because doing the “putting aside” motion, even for a moment is hard, especially when what’s at stake is important. (I can’t always do it.)

Outsourcing the this step to the facilitator, is much easier, because the facilitator has less that is viscerally at stake for them (and has more metacognition to track the meta-level of the conversation).

I’m curious if this is new to folks or not. Give me feedback.

 

Something simple to try in conversations

Last night, I was outlining conversational techniques. By “conversational technique” I mean things like “ask for an example/ generate a hypothesis”, “repeat back, in your own words, what the other person just said”, “consider what would make you change my mind”, etc. and the times when it would be useful to use them, so that I could make more specific Trigger Action Plans. I quickly noticed a way of carving up the space which seemed useful to me and potentially interesting to my (currently non-existent) readers.

In a conversation, second to second, you may be trying to understand what another person is saying, or you may be trying to help them understand what you are trying to convey. There are perhaps some other possibilities, such as trying to figure out a new domain together, but even then, at any given moment one of you is likely to be explaining and the other listening.

It seems quite useful to me, to be tracking which you are aiming to do at any given moment, understand – or get someone to understand.

Being aware of who is doing what allows two conversationalists to coordinate, verbally and explicitly, if need be. A conversation is apt to go better when participant A is focused on listening while participant B is focused on explicating, and vise versa. Discussions often become less manageable when both parties are too busy explaining to listen.

Before I start training more specific conversational TAPs, I’ve started paying attention to which of the two I’m doing second to second.