I know I’ve been churning out a lot of blog and so there’s a sense in which I should turn my attention elsewhere for a while. But this interesting idea occurred to me.
If you are reading this then you have heard of the Von Neumann-Morgenstern Utility Theorem. I don’t know why I even bothered to link it.
It’s a reminder as to why rationalists are obsessed with this concept of a utility function. Why are we trying to maximize the expected value of this thing? Where does this function come from?
The theorem starts off from assuming you have preferences, which seems reasonable enough – not just preferences over certain outcomes, but preferences over situations where you don’t know what the outcome will be (but are somehow able to assign exact probabilities). They’re supposed to be ordered – if you’re not indifferent then it means you actually prefer one over the other, and that that’s transitive.
There are a couple of other properties, continuity and independence of irrelevant alternatives.
From these restrictions a utility function can be built – and it’s unique up to a constant offset and positive scaling factor.
A lot of people look at those and go ok they sound reasonable so yay utility I guess, and remain confused about what their own utility function might be or whether they are supposed to even have one.
That’s probably true for me as well, but how do I debug that?
One way, of course, is to reject something from the theorem. We could reject the notion of probability or at least say that the subjective uncertainty we face in real life is a different kind of beast. We could reject the ordering of preferences. We could reject one or both of those other funny ones that don’t sound quite so intuitive. We could say that the entire construct would be so complicated in practice that we have no idea what our utility function would be. We can talk about infinite ethics, anthropics, logical uncertainty, bounded computation and other kinds of theoretical limit. There are probably others that I’ve forgotten.
In this post I’m going to take the stand that all of those nitpicks are missing the point. At least one is probably true, but that doesn’t make VNM Utility worthless. Even if we don’t take it as gospel, the theorem is still telling us something. What’s it telling us?
One of the possible breakdowns we could see with is preference cycles. As mentioned, if your preferences aren’t transitive then the premises don’t all hold true and so the theorem doesn’t apply. A classic (apparent) example of this, from the field of ethics, is the repugnant conclusion.
Specifically it’s about population ethics. Its premises are:
- (1) yay happy people. Introducing people whose lives are net-positive, even if there’s other happier people around, is a win.
- (2) some sense of egalitarianism: there’s some amount we can make the happiest people’s lives worse that’s justified if we make the unhappiest people’s lives sufficiently better
- (3) some sense of average utilitarianism: a large population of only-just-net-positive-lives is worse than a small population of awesomely happy people.
This introduces a preference cycle. The gimmick is simply that we start off with a small population of really happy people, add a whole bunch of somewhat-happy ones, smooth things out to some kind of average happiness and then observe that we’ve got a lot of only somewhat happy people and would rather have had the original bunch of really happy people.
So in ethics we would call this an “ethical dilemma” or something. For consequentialist ethics it’s even worse though, because to get consequentialism to work you basically have to fold in your “ethical” preference function with your own personal one, and get a since function representing what you really want – for either ethical reasons or your own personal enjoyment.
In this case then the ethical cycle becomes a cycle of personal preferences. Why’s this bad?
The classic answer is the Dutch book problem. I don’t actually know why we pick on the Netherlands in this particular piece of terminology, but somehow that’s what it got called.
Basically it says: oh, so you’d prefer it if there were some extra somewhat happy people? Would you give up a hard-earned dollar to make that happen?
It’s not unreasonable. The entire premise of the donation side of effective altruism, and to some extent charities in general, is that you spend money in order to buy some more ethically preferable situation.
The Dutch bookkeeper goes on to ask you for money to flatten out the happiness to some average, and then again to take you back to your original situation. Each time you’re willing to do it because after all it’s preferable, and presumably things can be tweaked so it’s sufficiently much preferable that it’s worth at least $1 each time.
And after each loop the bookkeeper makes $3 and you gain precisely zero ethics. Just like with the Escher staircase you’re gonna be pretty tired by the time you reach the top.
Unfortunately this doesn’t quite serve as an intuition pump. In reality we’re hardwired not to be taken advantage of quite this easily. Once we realize we’ve entered the situation that I’ve described we’ll cease trading – not so much because our preferences have magically aligned themselves, but rather that we are no longer willing to trade hard currency for them.
(There’s a similar-sounding way to take advantage of people that apparently actually does work, except it’s about greed and not ethics and it involves multiple players – and it’s called the dollar auction).
But if we kludge our system to not pay for stuff we want then we hit a problem. We’ve taken a global optimization headache and turned into a local one – there’s now stuff that we’re not willing to trade off against each other.
That sounds a lot like what “not having a utility function” is like. Stuff that you want – but wouldn’t give up money in order to get it, even if second-order effects like trust and signaling could somehow be resolved to your satisfaction.
That’s all just the background though.
You can say that if you don’t behave in a way consistent with such-and-such a principle, then you’ll end up with something that’s bad according to some other principle. And yet still you manage to muddle through somehow. Who even cares that someone wrote a thing that says what you did didn’t make sense, and someone else wrote another thing that agrees?
What I need then, is things that show me just how brutally I would be not getting what I want by falling into a trap like this. Or to put it another way: how much better things could be if I knew how to avoid them.
My own traps are not going to be the one that I just described. It’s probably not even going to be something like “how valuable is money invested in reducing AI risk compared to money invested in reducing the problems of global poverty” (where a lot of the uncertainty is empirical rather than about preferences as such, but similar considerations apply if I flip-flop on my probabilities).
My guess is that the biggest losses, and hence the biggest potential wins, come from preference weirdness in my own life day to day and moment to moment. It’s 1:28am and I have work tomorrow – was writing this really worth the time? If so were all the other things that I did ahead of it also worth the time? Back when I was losing at Beeminder, was it worth paying all those fines in order to not do the stuff I thought I wanted to do anyway? And so on.
So coming up with a table of how much I value different things might be valuable. It’ll be interesting to see, at least, how it falls down. Will I be too embarrassed about what I want to write down accurate numbers? Will it be too complex? Will my revealed preferences be so far from the ones that I write down that I’m effectively lying to myself, only letting one part of my personality express itself numerically? Is there some kind of grand weirdness going on in my mind that makes the whole idea hopeless?
What will the table even look like? (One number per thing might not be enough – it might make more sense to express different kinds of value in separate columns and decide how to funge them at the end. And the value of things depends on how much I’m already getting, and maybe on other factors too).
We shall see.