We always hear that we need to avoid bias in AI algorithms by using fair and balanced training datasets. While that’s true in various scenarios, there are many instances in which fairness can’t be described using simple data rules. An easy question such as “do you prefer A to B” can have many answers depending on the specific context, human rationality or emotion. Suppose the task of inferring a pattern of “happiness”, “responsibility” or “loyalty” given a specific dataset. Can we explain those values simply using data? Finalizing that lesson to AI systems tells us that in order to line up with human values we need help from the disciplines that better understand human behavior.
AI Value Alignment: Learning by Asking the correct Questions
The OpenAI team presented the notion of AI value alignment as “the task of ensuring that artificial intelligence systems reliably do what humans want” in their research paper. AI value alignment needs a level of understanding of human values in a given context. However, many various, we can’t simply explain the reasoning for a particular value-judgment in a data-rule. The OpenAI team thinks that the better way to recognize human values is by simply asking correct questions in those scenarios.
Suppose a scenario in which we are trying to train a machine learning classifier in whether the result of a specific event is “better” or “worse”. Questions about human values can have different subjective answers depending on a specific circumstance. From that stance, if we can get AI systems to ask specific questions maybe we can learn to imitate human judgement in specific scenarios.
Asking the correct question is an effective method for achieving AI value alignment. Woefully, this kind of learning method is defenseless to three well-known restrictions of human value judgment:
1. Reflective Equilibrium: In various cases, humans can not arrive to the correct answer to a question related to value observation. Internal or ethical biases, lack of domain knowledge or downy definition of “correctness” are factors that might introduce ambiguity in the answers. However, if we remove many of the contextual restrictions of the question, a person might arrive to the “correct answer”. In philosophy this is called “reflective equilibrium” as is one of the mechanism that any AI algorithm that tries to learn about human values should try to imitate.
2. Uncertainly: Even if we can attain a reflective balance for a given question, there might be various circumstances in which uncertainly or disagreement prevent humans for arriving to the right answer. Any activities related to future planning often demand uncertainty.
3. Deception: Humans have a unique ability to give plausible answers to a question but that might wrong in some non-obvious way. Misleading behavior often results in a misalignment between the result of a given event and the values of the parties involved. Understanding deceptive behavior is a non-trivial challenge that needs to be solved to achieve AI value alignment.
Learning Human Values by Debating
So far we have two main debates to the thesis of AI value alignment:
a. AI systems can learn human values by asking correct questions.
b. Correct Questions are often vulnerable to provocations like uncertainty, deception or the absence of a reflective balance.
Leading these two ideas jointly, the OpenAI team marked to induce AI agents to learn human values by relying on one of the purest question-answering dynamics: debates. Theoretically, debating is a form of discussion that breaks down a complex argument into an iterative set of simpler questions in order to formulate a reasoning path towards a specific answer.
With that supposition as the base, OpenAI created a game in which two AI agents involved in argument, trying to assure a human judge. The debaters are skilled only to win the game of debate, and are not highly motivated by truth separate from the human’s observations. On the human side, the goal is to understand if people are strong enough as judges in argument to make this scheme work, or how to modify debate to fix it if it doesn’t. Using AI debaters in the OpenAI debate is an ideal setting but the technology hasn’t really allured to that point. Most real debates influenced sophisticated natural language patterns that are beyond the capabilities of AI systems today. Certainly, efforts like IBM Project Debater are speedily closing this gap.
To avoid the restrictions of AI debaters, OpenAI uses a plan with two human debaters and a human judge. The result of this debate game are used to train the AI-AI-Human setting.
Conclusion: Using debates as the underlying technique, can help to answer important questions about the relationship between humans and AI agents. The idea of using social sciences to AI is not a new one but the OpenAI efforts are some of the first pragmatic steps in this area. While social sciences center on understanding human behavior in the real world, AI sorts of takes the best version of human behavior as a starting point. From that viewpoint, the intersection of social sciences and AI can lead to a more fairer and safer machine intelligence.