Good And Real, chapter 7 notes (Final)

Two-Boxers Aren’t Conscious

Nov 24, 2025

I’m in a reading group for “Good and Real” by Gary Drescher. These are my notes for chapter 7. This is the final post in this sequence of reviews. Quotes are sometimes slightly modified by me for readability.

This chapter attempts to explain that ethics is applied decision theory. I actually agree with this, but I think Drescher is confused and presents this badly, in part because he is unwilling to accept some of the implications.

Choice is a matter of acting for the sake of what would then be the case

I’m glad we got a formal definition of choice, but I really wish it had been introduced far, far earlier. He’s been talking about “choice-supporting subjunctive links” for several chapters. Honestly this is such a big SNAFU that I suspect he must have said it earlier and I missed it?

in the Prisoner’s Dilemma, there is in fact a means–end link from the action of cooperating with another, to the goal of another’s cooperation toward oneself, even in situations where cooperation cannot cause such reciprocity.

Good summary, and shows how the Prisoner’s Dilemma is a variation of Newcomb. I’ve already internalized the intuitive desirability of one-boxing, and if I can marry this to cooperating in Prisoner’s Dilemmas that will strengthen my cooperate instinct. :)

The biggest problem here is that laying out the case intellectually doesn’t actually convince the emotions, which are responsible for almost all action. So at best this is Step One, with a lot of work still to be done in the “convince the emotions” part of the soul-hack.

In this light, it’s remarkable how powerful art and culture are. Reprogramming the soul is a massive lift, and I hadn’t appreciated that before.

Next, Drescher works to strengthen the intellectual case the Newcomb → Prisoner’s Dilemma → Morality by saying you don’t need to be playing with Omega to feel secure in one-boxing, you just need to be convinced that the other agent is also competent enough to work out the correct choice (that one-boxing is ideal and these scenarios are special cases of one-boxing.)

I need not anticipate what each of your atoms or even each of your neurons does when you add the numbers. … Unlike the science-fiction simulation, though, this high-level simulation depends on our mutual competence.

Similarly, you can predict my answer to an addition problem even if we have different algorithms, as long as we know we’re both competent to solve the problem
[imagine solving Newcomb to be as widely known as addition] I can thus set up a Newcomb’s encounter using that folk-psychological simulation of your choice deliberation, in place of an atom-by-atom simulation. I would place $1,000,000 in the large box (whether it is opaque or transparent), confident that you would make the correct choice and take the large box alone

Prices can exist because people are numerate. Signs can exist because people are literate. It may be a great project to create popular children’s stories that teach Newcomb’s problem intuitions at an age-appropriate level so people can be high-trust. Possible difficulty - you can easily verify if someone is numerate or literate. Can you do that to determine if they’re ethical? All signals seem like they are fakable in a way numeracy and literacy are not.

the rules that express the algorithms for human-level intelligence apply to an even more vast set of situations that involve a vast number of problem types. Thus, as with basic causal laws, basic problem-solving rules apply to exponentially many situations, making those rules especially applicable and explanatory

Earlier Drescher had set up the Explaining-Away principle, which gives strong preferences to rules that are widely applicable and explanatory across broad domains. Here he’s arguing that this applies to the Newcomb → Morality pipeline. It’s self-consistent to do so, but feels a lot like writing a prophecy and then writing the fulfillment of that prophecy… I’m not giving you extra credibility points for it. XD

If we all behaved in a predatory fashion to the extent that we could get away with it, to the extent that we could reasonably expect a net gain, never refraining from such profitable behavior simply because it is wrong or unfair, then (arguably) our level of cooperation would be dramatically less than it is in fact.

This works because we all already know about the reason you don’t kill a healthy patient to harvest his organs to save five people. I don’t think he does a good job laying out that case here, I think he barely even tried TBH, but it’s (arguably) ok because we’re already familiar with it.

Drescher hits a major stumbling block on the issue of “who should we actually cooperate with, tho?” because his theory points out there’s no reason to cooperate with those who can’t reciprocate.

how about a policy Act for the benefit of those who have property Y, where Y is shared by you and by all your likely potential reciprocators?
One problem is that if that were your rational strategy, it would also be the rational strategy of potential reciprocators who might benefit from a policy that uses an even narrower criterion Z that encompasses them and their reciprocators, but excludes you.

Uh… yes? He says this is a problem, but this is simply true. The only way to be included in such a cooperative exchange is to find some way to be able to reciprocate! If you have nothing to offer, even in expectation, there is no reason to be included. That’s just suicidal altruism. It would be like infecting yourself with Streptococcus for the benefit of the strep bacteria.

A policy of respecting each entity’s existence equally, for example, would have us fastidiously value a bacterium as much as we value a person. Following such a policy would not be a means to achieving your own goals

I think he just surrendered.

If various random events had transpired differently, you might have been a non-Y yourself.

What an interestingly irrelevant true statement. He appeals to Rawl’s Veil of Ignorance here, but Rawls’ Veil is ridiculous for the reasons he just articulated. “I” might have been a Streptococcus bacteria, so why should the Veil exclude them?

(I have actually had someone argue to me that C. Elegans should be included in the Veil, because since they have 302 neurons “he” could indeed have been a C. Elegans instead. This points to the lunacy of the Veil — he could be me if we rewound time and he had my genes, memories, and life-circumstances & life-history. I could be him if we rewound time and I had his genes, memories, and life-circumstances & life-history. But that’s already they case! Time already ran under those conditions. To say that we could be swapped is a meaningless statement in this case. The modern extrapolation of Rawl’s Veil is just a secular adoption of the Christian soul myth, and it’s equally dumb.)

Either he has to concede that actually it’s impossible for “you” to have been a non-Y, or it is appropriate to exclude some non-Y. He says there’s “a balance” that must be struck, but…

I make no pretense to have offered a detailed account of how to strike the appropriate balance

So handy! At this point I am beginning to suspect that he isn’t determining morality from first principles, he has a decision theory he likes and is trying to force it to conform to the morality he already prefers.

The discussion of the dual-simulation transparent-boxes variant of Newcomb’s Problem (sec. 6.3) argued that it can make sense to act in pursuit of an entire probability-distribution as to how things might have turned out—even if you already know how things did turn out

Feels like a trick. He claims this justifies Rawl’s Veil, and if a method can justify a false thing (eg: Rawl’s Veil) it has some serious flaws, so now I think he just admitted it’s a trick. Which is unfortunate, kinda kneecaps the whole book. :(

Intuitively, there are two basic desiderata for a system of ethics: that the system prescribes behaving well toward other people, or prescribes behaving with respect for certain principles; and that the system provides a rationally compelling reason to behave as it prescribes.
Satisfying either of these is notoriously easy; satisfying both at once is notoriously hard.

I must have a very wrong theory of mind or morality, because this sounds like the opposite of true. Giving a compelling reason to behave well or follow certain principles is prescribing those principles. If there aren’t compelling reasons to follow those principles, they aren’t being prescribed! That’s what that word means! How are you using words if these things are coming apart?

The apparent conflict between ethical motivation and self-interest arises, I suspect, from mistaking causal self-interest for self interest generally.
How would I like it if others[...] questions [are] not usually recognized as grounded in the pursuit of self-interest. That that influence turns out to be so grounded does not undermine its genuinely ethical nature; it should just change our theory of what genuinely ethical considerations are.

He seems to be saying “People think “how would I like it if someone did X to me?” isn’t a self-interested question. But it IS a self-interested question. Those people think that means it can’t be a genuinely ethical question, because ethical questions aren’t self-interested. But they are wrong.”

I’m glad I live in a bubble were these people don’t exist, that sounds so bizarre to me. Of course ethics is self-interested. Any suicidal ethics would remove itself and its followers from existence, which is net-negative.

we need not fear that this new sense of self-interest permits us to ‘‘cheat’’ the way purely causal considerations do. For here, the subjunctive entailment of reciprocity holds even in the absence of any causal link to the reciprocal benevolence. Reciprocity thus becomes inescapable, categorical—except, perhaps, in the case of a hypothetical entity so powerful as never to be, and never to have been, or to have had the possibility of being, very dependent, even indirectly, on others’ benevolence.

Big oof. This is a direct continuation from the previous quote. So if I understand correctly, it can be restated as “Yes ethical questions are self-interested. That doesn’t mean you can cheat and not include all entities (included non-reciprocators) in your reciprocity calculation (see prior notes on calculation). Reciprocation is mandatory and inescapable.”

This sounds like just a straight-up religious declaration to me. It’s much like “I demonstrated that God is That Which Necessarily Exists, so His existence is inescapable.” I guess that’s great if it helps you stick to the ethical code you endorse, but boy is that not a Compelling Reason for anyone else.

The proposed basis for my deserving your moral regard is that your treating me well subjunctively entails others’ doing likewise to you (which is to your benefit). But the same does not hold for a rock

Here’s the core of the issue, as we in the reading group were able to discern it:

You obviously don’t include rocks in morality calculations, they can’t reciprocate. You obviously do include your peers. Between rocks and your peers there’s a lot of space… where do you draw the line? Decision theory says you draw the line at entities whose potential cooperation can be affected by your potential cooperation. Mostly this is great! But it has a few weird edge-case implications:

We should treat nonconscious agents well, as long as they can execute decision theory correctly. If a chatbot provably has no inner experience, but can do decision theory, you should treat them as a person rather than a tool.
It’s ok to hurt rocks that feel pain. They cannot cooperate, they cannot execute decision theory, they are not potential cooperation agents.

This isn’t just theoretical trivia, because animals can’t execute decision theory, but can feel pain. Historically, they have not been included in ethical consideration, aside from a few special exceptions. Drescher doesn’t like this.

In fact there’s several times where Drescher says something like “Fortunately, this decision theory actually allows us to avoid conclusion X!” This is my biggest tell that he isn’t actually deriving morality from decision theory. His bottom line is already written, it is something else, and he’s trying to fit decision theory around it. This negates the entire point of these chapters IMO. If you derived your ethics from something else, please tell me that something else instead. I don’t want the layer of justification you put over it.

Retribution is decision-theory approved!

Retribution may rationally have precisely the goal of not having undergone the harmful act in the first place (just as choosing just the large box in Newcomb’s Problem—even if it is empty—may rationally have the goal of the box not having been empty in the first place,
if self-harmful retaliation is not the rational choice, then there is no good reason for an attacker to fear such retaliation from a rational opponent, leaving that opponent without a credible deterrent.

I said previously that I still have a very hard time feeling like I should one-box in the Omega-Error Case: where both boxes are transparent and the big box is empty. Just feels bad. Drescher has hit on how to make it intuitive! Choosing to retaliate after you have been wronged, when the retaliation costs you far more than any possible benefit and the harm is already done, is equivalent to one-boxing when Omega made an error. You must retaliate/pick empty, because if you were the sort of entity who did not retaliate then you would always be attacked/never get the million dollar box. The only way to be the entity that doesn’t get attacked/does get the million dollars is to be the entity that actually does retaliate/does pick the empty box.

If I get nothing else out of this book, I’m grateful for this coupling of theory to emotional understanding.

I’m also awed that evolution managed to program into us on a primal level such an unintuitive act (one box even when you can see its empty)! Without having any understanding of decision theory at all this eldritch process made us want to choose the bizarrely correct action, at least in the cases most likely to occur in reality. Crazy!

Cooperating across time

Re smoking — if you want to quit, you can cooperate with your future-self by not smoking today. Or…

[you could] do it just once more and then quit—a choice that, unfortunately and paradoxically, then repeats itself each next time, forestalling quitting forever.
you do, in effect, choose all at once

The only way to cooperate with your future-self is to not defect when the option comes up, because all sufficiently-similar versions of you will make the same choice. You choose for all of them, across time.

If one of your neurons—or all of them—were about to be replaced by functionally equivalent silicon prosthetics, would you have any less reason to care now about your post replacement self than if your original neurons were to remain undisturbed? By the present account, the reason to care is grounded in the subjunctive link between your choice and a symmetric choice made by another—or a symmetric choice made by yourself at a different time

that subjunctive link depends only on both of you using similar competence to solve symmetric problems.

Here he’s saying more concretely that decision-theory executors should be treated as moral beings. What matters is both actors being able to solve and execute correctly, even if one of them thinks in silicon rather than carbon.

On Searle’s Chinese Room:

If you were to understand the complicated system you were hand executing, you would indeed understand Chinese, and far more—you would also understand how the program being executed understands Chinese. The program being executed is like a Rosetta stone on steroids. It would be replete with, for example, memories that encode visual and audio imagery of dogs, subjunctive representations of how dogs would be expected to act under various circumstances, taxonomies relating dogs to other organisms, and so forth—as well as representations of linguistic constructs related to dogs

Basically an argument that Searle’s room doesn’t just know Chinese, it’s also basically a person. Interesting when applied to today’s LLMs, since our LLMs are actual Chinese Rooms for English. It’s neat that the people who saw Searle pose the thought experiment and answered “well the room understands Chinese” lived long enough to see it empirically tested

The Full Thesis

Drescher lays out his full thesis, what all the chapters of his book so far have been building towards. Here it is, with commentary:

1 We can in principle show that when you speak of (or otherwise report, or just think about) your conscious experiences, the events that your report ostensively points to—the events that in fact give rise to your report— turn out to be certain events in your brain

Agreed. Experiencing is the result of brain activity.

2 The events pointed to can be described at various levels of abstraction: in particular, as computational events (abstracting above the underlying implementation), or as biochemical events (not abstracting above the implementation). The computation includes the smart recording and play back of various events, using terms of representation that designate interrelatedness, implementing an understanding of those events

This is his Cartesian Camcorder proposal. It’s interesting, though unsupported. Perhaps this lack of support would be a problem, but it doesn’t matter because the idea is entirely irrelevant. Nothing in the rest of the argument depends on this… or even references it. I don’t understand why it’s in this book at all. If you remove point #2 from this list, nothing changes.

3 You have reason to promote others’ interests (or your own deferred interests)—even beyond the extent to which you may directly value those interests

Yes. I think he didn’t fully get into the reasons that exist for promoting others’ interests. One-boxing is a strong reason to cooperate with other competent decision-theory executors! But there are many other reasons to promote others’ interests as well. Trying to shove everything into a special-case of Newcomb’s Problem really handicapped him, as we saw earlier when he couldn’t draw a moral line where he wanted to using just Newcomb.

4 Participating in (partly acausal) subjunctive reciprocity is a matter of performing the right sort of computation

I believe this is saying that a computation can deduce #3. Fair enough.

5 That an entity has the right kind of mental events, and in a way that obliges us to care about or respect its interests, is a central aspect of what we mean by conscious. Performing the right sort of computation (under a nonjoke scheme, but regardless of implementation) does qualify an entity to have its interests respected, fulfilling that aspect of the meaning of conscious

This is saying “I define consciousness to necessarily include #3.” This does mean that an entity that cannot compute the decision-theoretical correct answer to Newcomb’s Problem is not conscious! I actually really love this. It does exclude all animals and Causal Decision Theorists. He does stress in other places that we must reciprocate cooperation with everything (except rocks & bacteria?), so in practice this doesn’t stop him from treating Causal Decision Theorists as people. But in theory, two-boxers aren’t conscious. :)

I believe his goal here was to include aliens and sufficiently-advanced AIs as conscious, and he missed the consequences of making a complex computation necessary for consciousness. Perhaps he could get out of it by saying evolved instincts can substitute if they follow the decision theory well enough (like our instinct for niceness and retribution).

6 Performing the right sort of computation, in the foregoing sense, can in principle be objectively, externally verified hence, so can consciousness.

This is mainly just reiterating that you defined “consciousness” as “the computation of #3.” But it does explicitly state that “We can test if something is conscious by testing if it can do the decision-theoretic calculation”!!!! By this metric I think the latest LLMs are conscious. I don’t think this is a valid test for consciousness, and after nearly 20 years I assume he doesn’t think that either.

Also maybe I’m misreading him? Because this seems like a very bold claim! Maybe he really was that confident in 2007 that nothing could understand and compute decision-theory-correct answers unless it was conscious?

Anyway, that’s the thesis as I understand it: Experiences are the result of brain activity; consciousness is the result of decision-theory-computations; ethics is the result of getting the correct answers of decision-theory-computations; and we can test for consciousness by testing if an entity can make such calculations.

The chapter closes with an admission:

Both in physics and in ethics, even if we accept the principles extracted from reasoning about idealized toy scenarios, the explicit application of those principles to everyday situations is often impractically complex. Anticlimactically, after all the analysis, we must revert to trusting our intuitions much of the time—

This is a terrible thing to read hundreds of pages into a book. Next time I propose we read a work on how to discover which intuitions are the ones we want our fellow entities to have and how to cause them to have these intuitions. Or as it is commonly called: Desire Utilitarianism.

We didn’t read Chapter 8, because Chapter 8 reveals the meaning of life and such knowledge was not meant for mortals. I’m assured it was kinda interesting if you’re really into Drescher, but not eventful otherwise.

I think overall reading this was net-positive utility. It was often fun and it helped me cement a couple intuitions that weren’t quite fitting. Being in a reading group helped with both of those.

The main thing I got from it, however, was a renewed appreciation of Yudkowsky’s work. His Sequences look even better in comparison to contemporary works. They are more coherent, more explanatory, and are written in a way that maximizes accessibility and retention. The way he delivers the payload of the Sequences into your emotions is far more difficult than I had realized, and is vastly more important in the long run. No wonder he inspired an entire generation of thinkers. I’m glad I was there to see them born, and now live in their wake.

Death Is Bad

Discussion about this post

Ready for more?