So I started writing down a list of things that I did with a view to coming up with a number for how important each one is.
This was kind of missing my own point.
If you imagine a committee of people tasked with deciding how important various stuff is, they would sort of mull over it and argue and compromise and out of that would come some numbers, and you wouldn’t know if the numbers meant anything. That’s what I would be in danger of doing if I wasn’t thinking about it carefully.
If I decide that this bright green soda that I’m drinking is worth -0.05 utils, but I’m drinking it anyway, what am I to make of that? If I decide that writing this blog post is worth 0.5 utils, but writing out a todo list is worth a whole 1.1 utils and would only take half the time, but I’m writing the blog post anyway, what am I to make of that?
Basically it tells me one of two things. Either that my behaviour is “wrong” in some sense, which I sort of knew anyway. Or that the numbers are wrong, which also would not be very surprising.
Suppose, however, I came up with numbers I thought expressed my revealed preferences. Drinking the green thing would now come out positive, because it’s clearly something I like doing as long as I haven’t already had a lot of sugar recently.
I’m sure I could come up with something representing the revealed value of tasks involved in the managing of other tasks, such as a todo list, that needn’t necessarily match up to what you’d expect the value to be by calculating before/after scenarios where I’m confused about what I should be doing.
These numbers – which don’t express preference so much as a general tendency to do something – may also not agree with reality. Once I’d calibrated the numbers correctly such a discrepancy would now represent a surprise – something I thought I didn’t like, I now really really like! Or vice versa. That would seem to be useful information in some sense.
I could also look at the numbers directly, and if they seem to disagree with what I thought the thing should be worth, it also tells me something but it’s really just the same thing as before.
And yet I was saying the numbers aren’t the important part. What am I driving at?
There’s a sense in which anything can be expressed as a utility function but it isn’t an interesting one. You just just assign 1 to whatever outcomes are consistent with the laws of physics and 0 to the others and it would predict your behaviour just fine (although you couldn’t compute it in practice). Instead, when we talk about utility functions informally we imagine two things:
- Abstraction. Utility is a function over high-level things that we have a mental concept of, not in general over the exact placement of individual objects.
- Continuity. Very similar things get roughly the same value.
- Homogeneity. Value is reasonably robust to things being displaced in space and time, reordered etc.
Imagine we are thinking of the utility of “drinking a glass of milk”. The first criterion I mentioned is necessary to even understand that language, so we’ll just assume it.
The second expresses that if we wait 1 second or warm the milk up 0.5°C then it shouldn’t affect the value very much. There’s nothing in the Von Neumann-Morgenstern axioms that implies this kind of continuity – continuity there is purely with respect to differing probabilities between two completely fixed outcomes.
Homogeneity is also required for the “drinking a glass of milk” to be assigned a value – there can be a lot of other things that are different about the world outside, affecting its value overall, but the difference between drinking the milk and not drinking the milk is largely independent of that. When there’s something it’s not independent of, like how thirsty we are, it’s often worth drawing attention to that.
Occasionally, small differences in an action or differences in the order between actions actually does make a lot of difference. From some distance away, me typing this would look extremely similar to me typing random garbage, but one clearly has more value than the other. By adjusting our various lenses used to interpret abstraction, continuity and homogeneity we can bypass this though. Generally on those occasions when 5 seconds makes all the difference, we’re well aware of that.
Basically we have to interpret these three criteria informally. Trying to axiomatize them like was done with VNM utility would be hugely frustrating.
(Also I made them up just now. Take all this with a grain of salt).
I’m guessing that modeling all of your decisions under a single “revealed preference” function would satisfy the abstraction and continuity criteria but not homogeneity.
If it turns out not to be homogeneous, there are different ways to look at this:
- A multi-agent system where the weaker agent (the one that speaks in words and is writing this post) is usually completely not in control and just very good at making excuses. Sometimes this agent can seize control for a bit, and during those times decisions will be made differently.
- The stronger, shadowy agent doesn’t have completely stable goals. They can change, either spontaneously or at the prompting of the weaker agent.
- There really is just one utility function but it’s weirdly time-dependent or something.
I find the middle one the most interesting. There’s an idea that the shadowy agent’s goals, for the most part, are not “terminal goals” but are in the service of something else, and built into this is some kind of assumption that they really are the best way to achieve the thing. If that assumption can be dissolved then you can make progress aligning the goals between your agents without at any point feeling you have to give something up.
I like this. It has an “almost too good to be true” feel to it, so I’m certainly not presenting it as the only hypothesis. But I’m eagerly on the lookout for signs it might be true and clues as to how to actually do what I described.