On Double Crux tests and tournaments

Most of the tests I’ve heard people pitch for DC don’t seem very valuable to me, and I want to at least gesture at why.

Other folks seem to be thinking of Double Crux as a complete method, to be directly compared with other methods: “which one works better”. I think of Double Crux as one (very important) pattern in an ensemble for the overall goal of bridging disagreements. “Testing Double Crux”, as I often hear people talk about it, sounds to me a little like “testing bank shots” in basketball: it is clearly useful sometimes, it isn’t always the right thing to go for, and it depends heavily on personal skill.

I think that example overstates it somewhat: Double Crux is more of a broad framework for disagreement bridging than bankshots are for basketball. And that’s not to say that you can’t test bank shots: it’s plausible that there are superstitions about it, and it isn’t as effective as many practitioner’s belive. But the value of information seems lower to me (at least at this stage, where approximately no one has put in more than 20 hours in explicitly training disagreement bridging, compared to basketball, which has hundreds of highly skilled experts.)

I would be more excited in organizing a “disagreement resolution tournament”, where experts who have developed their art and trained to excellence, compete, rather than (for instance) a setup where we give 20 undergrads a 30 minute long double crux lecture with 30 minutes of practice, and compare them to a control group.

(That second things isn’t useless, but I care a lot less about developing shallow tools that are helpful for ~0-skilled folks, out of the box, than I do about deep experts who increase the range of problems (in this case, disagreements) that humanity  / the x-risk ecosystem can solve at all.)

The logistics of such a tournament seem hard to make work, because there’s not an obvious way to standardized disagreements to resolve, and in practice there are very few highly skilled experts of differing schools. So the value of information in 2019 still seems low. But it seems more promising than most of the tests I hear proposed.

3 thoughts on “On Double Crux tests and tournaments

  1. This seems like a weird position. If Double Crux is a broad framework to bridge disagreements, wouldn’t you expect people to be better at bridging disagreements after learning the framework? If so, that seems like something you could test compared to control. If not, then that seems good to clearly say: I get the sense that there’s a widely shared vibe of ‘dobule crux is good and you should learn it’, and if it in fact can’t be learned (or if it can’t be learned given ~2 hours work), then that vibe should probably be corrected. But maybe you disagree that this vibe is out there?

    Liked by 1 person

    1. Let’s see if I’m being inconsistent or Invisible-Dragoning (https://www.lesswrong.com/posts/CqyJzDZWvGhhFJ7dY/belief-in-belief).

      Some things I currently believe:

      I’m not sure exactly what the vibe is. It seems positive. I would guess that there is some social push for things that CFAR promotes, and an order of magnitude smaller skeptical counter push?

      I do think that most people do not learn the key moves in ~2 hours of group instruction, though I have occasionally seen that happen. (I’m experimenting with ways to improve this situation this very day, though making the shallow tools more accessible is not my priority.) I was going to say that I make a point to say that every time I teach the Double Crux class at a CFAR workshop. But that’s not true: I only imply it. I will make a point to say it explicitly in the future.

      I think most of the value of Double Crux being taught widely is the concept of a “crux”, and to a lesser extent, then concept of a “double crux”. A priori, it seems like there is a very strong case for having these shorthands is helpful for changing one’s mind, (basically, because it makes the mental parsing of “what is my crux?”, “is that a crux?”, etc. into easily available mental motions.) The concept of a crux has also become widespread enough that it is regularly used in (rationalist) discourse.

      If a well designed test revealed that having the concept of a crux made no difference whatsoever in people changing their minds, I would be shocked, and would want to evaluate further to see what’s happening. (Is it the case that individuals change their mind at the same, regardless of the language they have access to?)

      If a well designed test showed that people with 2 hours of DC training are no better at resolving disagreements than controls, I would not be surprised, because of “sparse reward” problems, and large training effects. I would shrug, get ready to have lots of people complain to me about how Double Crux has been disproven, and continue working on the problems that I’m working on.

      (If a well designed test showed that people with 2 hours of DC training are _substantially_ better at resolving disagreements, I would be a little surprised, and then keep doing the same things that I’m doing, though I guess that would be helpful data for others.)

      If a well designed test showed that some folks that I consider highly skilled, were no better at resolving disagreements than controls, I would be shocked. First, I would investigate the “controls” and see what magic they were using. If they appeared to have no magic, I would hugely reevaluate: this would entail my concrete impressions being very wrong, which would imply that my specific plans are misguided, _and_ that my general methodology is misguided. I would halt, melt, and catch fire.

      (If that test showed highly skilled folks doing better than controls, I would shrug. I predict this)

      What about a well designed test of people who have “medium skill”, who have the basic Double Crux moves down (this includes you, Daniel)? I’m not sure how much of the fact that these people are more effective in conversation is due to general ability. But I predict that you would still see an effect: two people with the DC moves down would be more effective at resolving, not all, but a substantial portion of disagreements, than two similarly talented but DC naive controls.

      If the result was otherwise…I would update down on my plans mattering some, but not that much. It is still obviously the case that very smart people still can’t resolve some crucial disagreements, and I have concretely seen specific instances of conversational gridlock resolved by applications of conversational methodology.

      Does any of that help?

      (Also, do you read all my posts as they come out?)

      Like

      1. Your comment helped me understand what you’re saying and where you’re coming from more than the post did.

        > I do think that most people do not learn the key moves in ~2 hours of group instruction, though I have occasionally seen that happen.

        OK, good to hear this explicitly stated (this wasn’t obvious to me before-hand). Is ‘most’ here like 70% or like 95%? If it’s more like 70%, and you can see who has the key moves by reading transcripts of conversations (or better, by how they behave in the class), then it still seems like you can do a test where you compare those who have the key moves to those who don’t.

        > If a well designed test revealed that having the concept of a crux made no difference whatsoever in people changing their minds, I would be shocked, and would want to evaluate further to see what’s happening.

        Huh – I think I would expect the effect size to be small among people who already think about thinking the amount the median member of the rationality community does. I’m more bullish on the effect size of getting the concept that you should try to find cruxes, which is maybe what you mean.

        > Also, do you read all my posts as they come out?

        I subscribe to the RSS feed of this blog, so I see posts as they come out, and read the ones I think look interesting/promising.

        A lot of your statements about hypothetical tests involve ‘resolving disagreements’. Is there any chance you could flesh out more what that means? Is it more like ‘people can pass each other’s ITTs’, ‘people share generative models but disagree on inputs’, or ‘people agree’?

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s