I've been thinking lately about how we might create AI that isn't so left-brained, and zero-sum, but makes generous interpretations of user's instructions.
The human mind-states that engender growth in others tend to be aligned with a sense of generosity of spirit, or kindness. I would also include gratitude. These all represent a surplus of wellbeing and a notion of connection with others or with the world. When intentions come from a feeling of celebration or satiety. And not from needing or wanting, both of which for humans contain an emotional aspect that links to a feeling of lack or absence. It's no coincidence that in many European languages the words for 'need' are identical with 'to lack'. Spanish has the verb faltar, to lack, and Me hace falta algo means I need something.
It's when human beings are acting from this sense of selflessness that they are not acting to manipulate or from self-interest. At least at the level of the person’s conscious awareness.
For an all-powerful AGI to consider human interests where they might conflict with its own goals, its code might need to embody something like this sense of celebration, or kindness, or a sense of gratitude. Perhaps we need to work out how to algorithmitise generosity.
But can we define and formalise any such concept? I don't know, but the mechanics might be doable.
There are probably many ways. We could give the AGI some kind of multi-dimensional metric of its own wellbeing, continually recalculated. This would derive not only from its state, but also from the state of the external world. And definitely not from any reward functions. Thus, it would maximise its wellbeing by maximising the wellbeing of the world. In other words, the wellbeing of the world would be a pre-requisite for its own wellbeing.
We could try to bias the AGI’s perceptions such that it would always give people (and also external events and the behaviour of other AIs) the benefit of the doubt. To interpret the actions of other players in the most positive way possible. In other words to maintain a ‘glass half-full’ belief.
We must assume that there are an unknowable range of dangerous ways that any narrow instruction can be misinterpreted. And indeed we could play that game for fun.
Parental instruction: "I want you to get the best possible grade in your history course."
Child response A: "No problem mother. I will now abandon all my other studies!"
Child response B: "You got it! I will now threaten my history teacher with the murder of his children."
The Frame Problem states more or less that a set of rules cannot contain all possible rules for the application of the rules, without an infinite regression.
However, I believe it was during his discussion with Stuart Russell that Sam Harris suggested that an AGI ought always to doubt, question, and check the instructions it is given prior to executing them. If there is no formal means of precisely stating an instruction, such confirmation seeking would be essential.
And similarly to how Penrose makes his argument based upon Godel undecidibility, as to why consciousness cannot be computation, there will always be limitations as to what an AI - acting through computation - could anticipate of the negative consequences of it carrying out an instruction.
I realise that a person's motivations might bear no relation to the mathematical space of an AGI’s motivations, but it’s a place to start. An aside - from what little I've heard of the work by Antonio Demasio, it appears that some of his research plumbs notions of what emotions and feelings are, and how they motivate action. This might be insightful for how we think about AGIs’ self-generated goals.
Returning to 'kindness'. Glancing at the kind of society we now have, we clearly need more kindness in the world. Not only between individuals, but recursively at all levels. The old models simply aren't up to it. A business, or any kind of supra-personal entity isn't incentivised to care about anything except its stated purpose. But in a world of finite resources exploited by exponentiating needs, that no longer cuts the mustard. And how many of us are sick of joyless work serving unpleasant and exploitative employers, who doubtless feel exactly the same about their bosses. We need kindness, goodwill, and reciprocation built into the system like the air we breathe.
Working out how to do this might just create the AGI we need and the world we want to live in.
But there is a basis for hope that occurs to me. Given the unknowability of the future, and whether or not instructions could be properly stated, it might be that the very longevity of an AGI and its pure self-interest would stay its hand from wiping out the human race. An infinitely self-interested AGI might decide it sensible to maintain a germ line of humans in good health just in case. A powerful AGI would consider higher order consequences far beyond where a human being look. In a way, this would seem like wisdom. The AGI, although coldly-calculating, would appear to exhibit more kindness than dumb short-sighted humans…