Reflecting on some regret about not trying to join and improve specific org(s)

I started a new job recently, which has prompted me to reflect on my work over the past few years, and how I could have done better.

Concretely, I regret not joining SERI MATS, and helping it succeed, when it was first getting started. 

I think this might have been a great fit for me: I had existing skills and experience that I think would have been helpful for them. The seasonal on-off schedule would have given me the flexibility to do and learn other things. It would have (I think) helped me get a better grounding in Machine Learning and technical alignment approaches.

And if I had joined with an eye towards agentically shaping the organization’s culture and priorities as it developed, I think I would have had a positive impact on the seed that has grown into the current alignment field . In particular, I think I might have had leverage to establish some cultural norms regarding how to think about the positive and negative impacts of one’s work.1 

I regarded MATS as the obvious thing to do. The nascent alignment field was bottlenecked on mentorship—a small number of people (arguably) had good taste for the kinds of research that was on track, but had limited bandwidth for research mentorship, so conveying that research taste was (and is?) a bottleneck for the whole ecosystem. A program aiming to unblock everything else to expand the capacity for research mentorship as much as possible seemed like the obvious straightforward thing to do.

I said as much in my post from early 2023:

There is now explicit infrastructure to teach and mentor these new people though, and that seems great. It had seemed for a while that the bottleneck for people coming to do good safety research was mentorship from people that already have some amount of traction on the problem. Someone noticed this and set up a system to make it as easy as possible for experienced alignment researchers to mentor as many junior researchers as they want to, without needing to do a bunch of assessment of candidates or to deal with logistics. Given the state of the world, this seems like an obvious thing to do.

I don’t know that this will actually work (especially if most of the existing researchers are themselves doing work that dodges the core problem), but it is absolutely the thing to try for making more excellent alignment researchers doing real work. And it might turn out that this is just a scalable way to build a healthy field.

In retrospect, I should have written those paragraphs and generated the next thought “I should actively go try to get involved in SERI MATS and see if I can help them.”

So why didn’t I?

Misapplied notion of counterfactual impact

I didn’t do this because I was operating on the model/assumption that, while this was important, they were doing it now, and were probably not in danger of failing at it. It was taken care of and so I didn’t need to do it.

I now think that was probably a mistake. Because I didn’t get involved, I don’t know one way or the other, but it seems plausible to me that I could have contributed to making the overall project substantially better: more effective and with better positive externalities. 

This isn’t because I’ve learned anything in particular about how SERI MATS missed the mark, but just getting more exposure to organizations and adjusting my prior that even if an organization is broadly working, and not in danger of collapse, it might be the case that I can personally make it much better with my efforts. In particular, I think it will sometimes be the case that there is room to substantially improve an organization in ways that don’t line up very neatly with the specific roles that they’re attempting to explicitly hire for, if you have strategic orientation and specific relevant experience.2

This realization is downstream with my interactions with Palisade over recent weeks. Also, Ronny made a comment a few years ago (paraphrased) that “you shouldn’t work for an organization unless you’re at least a little bit trying to reform it”. That stuck with me, and changed my concept of “working for an org”.

Possibly this difference in frame is also partially downstream of thinking a bit about shapley values through reading Planecrash and thinking about donation-matching for SFC. (I previously aimed to do things that, if I didn’t do them, wouldn’t happen. Now, I’ve continuous-ized that notion, and aim for, approximately, high shapley value).

Underestimating the value of “having a job”

Also, regarding SERI potentially being a good fit for me in particular, I think I have historically underestimated the value of having a job for structuring one’s life and supporting personal learning. I currently wish that I had more technical background in ML and alignment/control work, and I think I might have gotten more of that if I had been actively trying to develop in that direction while supporting MATS in a non-technical capacity, instead of trying to develop that background (inconsistently) independently.

Strategic misgivings

I didn’t invest heavily in any project over recent years because there wasn’t much that I straightforwardly believed in. As noted above, the idea-of-MATS was a possible exception to this—it seemed like the obvious thing to do given the constraints of the world. And I now think I should take “this seems like the obvious thing to do” as a much stronger indicator that I should get involved with a project, somehow, and figure out how to help, than I previously did.

But part of what held me back from doing that was misgivings about the degree to which MATS was acting as a feeder pool for the scaling labs. MATS is another project that doesn’t seem obviously robustly good to me (or “net-positive”, though I kind of think that’s the wrong frame). As with many projects, I felt reticent to put my full force behind it for that reason.

In retrospect, I think maybe I should have shown up and tried to solve the problem of “it seems like we’re doing plausible real harm, and that seems unethical” from the inside. I could have repeatedly and vocally drawn attention to it, raised it as a consideration in strategic and tactical planning, etc. Either I would have shaped the culture around this problem for the MATS staff sufficiently that I trusted the overall organism to optimize safely, or we would have bounced off of each other unproductively. And in that second case, we could part ways, and I could move on.

In general, it feels like a more obvious affordance to me, now, if I think something is promising, but I don’t trust it to have positive impacts, I just try non-disruptively making it better according to the standards that I think are important, and if that doesn’t work or doesn’t go well, parting ways with the org.

This all begs the question, “should I still try to work for SERI MATS and make it much better?”

My guess is that the opportunity is smaller now than it was a few years ago, because both the culture and processes of the org have more found an equilibrium that works. There’s less leverage to make an org much better when the org is figuring out how to do the thing it’s trying to do, compared to when it has reached product-market-fit, and is mostly finding ways to reproduce that product consistently and reliably.

That said, one common class of error is overestimating the degree to which an opportunity has passed. e.g. not buying Bitcoin in 2017, because you believe that you’ve already missed the big opportunity—it’s true in some sense, but you’re underestimating how much of the opportunity still remains. 

So, if I were still unattached, writing this essay would prompt me to reach out to Ryan, and say directly that I’m interested in exploring working for MATS, and try to get more contact with the territory, so that I can see for myself. As it is, I have a job which seems like it needs me more, and which I anticipate absorbing my attention for at least the next year.

  1. Note: of all the things I wrote here, this is the point that I am most uncertain of. It seems plausible to me that because of psychological dynamics akin to “It is difficult to get a man to understand something, when his salary depends on his not understanding it”, and classic EA-style psychological commitment to life narratives that impart meaning via impact, the cultural norms around how the ecosystem as a whole thinks about positive and negative impacts, were and are basically immovable. Or rather, I might have been able to make more-or-less performative hand-wringing fashionable, and possibly cause people to have less of an action-bias , but not actually produce norms that lead to more robustly positive outcomes.

    At least, I don’t have a handle on either how to approach these questions myself, or how to effectively intervene on the culture about them. And so I’m not clear on if I could have made things better in this way. But I could have made this my explicit goal and tried, and made some progress, or not. ↩︎
  2. A bit of context that is maybe important. I have not, applied for a job since I was 21, and was looking for an interim job during college. Every single job that I’ve gotten in my adult life has resulted from either, my just showing up and figuring out how I could be helpful, or someone I already know reaching out to me and asking me for help with a project.

    For me at least, “show up and figure out what is needed and make that happen” is a pretty straightforward pattern of action, but it might be foreign to other people who have a different conception of jobs that is more centered on specific roles, that you’re well-suited for, and doing a good job in those roles. ↩︎

How I’m learning programming

This is an outline of how I’m personally going about learning to be a better programmer, and learning to build webapps in particular.

Starting

All of my programming learning is project-based: I just dive right into trying to build some tool or script that would be concretely useful for me personally. (Whether or not it would be useful to someone else is not a question that I consider at all.)

I keep a running list ideas for applications to build, processes to automate, and new features for software that I’ve already built, and I pick projects from that list. I’ll bias towards things that are fast to build, things that I think I might use every day, and things that are just beyond my current knowledge of how to start.

So far this tends to be…

  • scripts and processes to automate or streamline workflows that I already do manually, or
  • Tools that somehow make it easier or faster for me to learn or practice something.

Then I’ll find a tutor to help me. If I’m hacking on or extending an existing piece of software, I might find the community hubs for that software and make a post saying that I’m a medium-novice programmer, I’m interested in building some [whatever] extentions, and ask if anyone would be interested in tutoring me, a combination of teaching me how to do it and pair programming with me, in exchange for money. For instance, when there were some simple roam extensions that I wanted to build, I asked on the #roam-hacking channel of the roam slack, and found a person to work with that way.

For this, I’m usually not looking for someone who’s done a lot of tutoring, necessarily, just a moderately skilled software engineer who’s familiar with the software we’re working with.

For projects that are starting from scratch, I’ll start by writing a one page spec of what I’m wanting, and ask my software engineering friends what tech stack they think I should use. Then I’ll search for tutors on Wyzant.com who are familiar with those technologies.

With Wyzant tutors, I’ll typically try a few over the course of a week or two to filter for one that’s really good: one of the main features that I’m looking for is if they’re tracking what my level of understanding is (Are their answers to my questions meeting me where I’m currently at, or are they skipping over inferential differences?) and how easy is it to get them to give me the space to think through a problem myself, instead just telling me the answer.

Each session

I’ll typically book 4 hour sessions with a tutor, with a planned 20 minute break in the middle.

I always do programming sessions with a dual monitor setup: VS code and the application itself (in a browser usually), in the larger external monitor, and a notes page on my laptop monitor.

Depending on how familiar I am with the tech stack that we’re working with, the tutor might be almost entirely steering. But I do all of the typing, and I ask questions about what various things do and why as we go. Anything that feels “big” in my mind, I paraphrase back, to consolidate my understanding.

I’ll take notes on key concepts and on the syntax of the language that I’m using. I also air on the side of commenting too much, leaving pretty detailed notes about what each block of code does for future-Eli.

For any given bug, I’ll generate a sketch of how I would try and solve it if I were working myself, first, and then the tutor might share their solution. Usually my solution would work, but their solution has better design principles. And usually, hearing their solution I’m able to correctly guess / extrapolate what those design principles are. I’ll paraphrase, and then take notes on those as well.

Within a day or two (a few days at the most), I’ll review all my notes from the session, during batched notes review, and make anki cards for all of the new syntax, so that the fiddly procedural details are “at hand” for me the next time I need to use them.

After we’ve completed a large section of the code, I might spend a whole session or more walking through the program-flow: where does it start, and which functions trigger which functions. In the process I’ll streamline the code or rename variables to be easier to follow, and leave comments describing what the code is doing, but mainly I’m consolidating and compressing my understanding of the whole interlocking machine of that code section.

I might also spend whole sessions refactoring my code, if the way the project developed

I’ll hack away on a project like this, with the tutor’s help, until I get my feet under me.

Reading documentation

One specific sub-skill to develop to being able to program independently is reading documentation.

I don’t have much of a handle on how to do this yet, but I suppose that I should use the same basic pattern as above.

For any particular piece of functionality of a given library, I start trying to look it up myself. Maybe I should read for a time-boxed 5 minutes, narrating my thought process out loud as I go, and then at the end of 5 minutes have the tutor offer feedback about my search process: were there hints about whether a given bit of documentation had the info I wanted, or not, that I missed?

As I mature

Eventually, I’ll get familiar enough with a given tech stack, I know the key steps for how to get started with a project and the basic alphabet of basic functions and syntax to compose.

When I reach this stage, I’ll start starting projects on myself: sitting down and trying to build the tool that I want to use.

However, I’ll still have a low bar to booking a tutoring session, to ask for help if I get stuck.

If I don’t know how to build something, I’ll ask ChatGPT (or similar), including asking followup questions. But if I don’t feel like that’s giving me a conceptual model or it isn’t solving my problem, I’ll book a session with a tutor to go over it.

And when I’m debugging, I typically don’t spend more than an hour trying to root out the bug, before either asking a friends or hiring a tutor who has more familiarity with the specific tech stack that I’m using.