AI: This is how researchers make computers that can think
Deep learning is susceptible to bias. Therefore researchers are working to develop new methods to teach computers to think about their decisions.
We live in a time where computers not only have become an indispensable part of our daily life, but also are beginning to replace people in tasks that our lives depend on. We trust computers to control nearly all air traffic; we build cards that are driven by computers; we use computers to administer medicine to patients in hospitals; and we are starting to use computers to diagnose diseases.
But how do we know that we can trust the programs that are used to support all these activities? How can we be sure that our planes will not crash? How do we know that our self-driving car will not run over the old lady crossing the street?
Is it really better to trust a computer than a nurse or a doctor?
The answer is yes – if the programs carrying out these tasks are certified.
Is it a plane? Is it a bird? Yes, it is a bird
The traditional artificial intelligence systems are based on deep learning. The principle behind deep learning is that a computer can learn from examples – just like children do.
For example, we can train a computer to recognize pictures of birds by giving it a large number of pictures, where some show birds, and some do not show birds.
For each picture, the computer needs to guess whether it shows a bird or not, and it is told whether it has guessed correctly. The computer's guesses become better and better, until it (almost) stops making mistakes.
In the same way, we can train a computer to make medical diagnoses, so that it will actually make fewer mistakes than a human doctor.
However, two problems arise. First, the computer’s answers can show bias or prejudice; secondly, its answers come with no explanation.
Bias or prejudice typically originates in hidden patterns in the examples used in the training state – patterns that were not detected previously.
A scary example of just how wrong things can go happened when Amazon tried to design a computer program to sort through the curricula of candidates applying for a job.
After several rounds of testing, it became clear that the program rejected all female applicants.
This happened because the program was trained with data from the hirings over the previous 10 years. Since the industry is dominated by men, the computer ‘learned’ that women were not good candidates.
In this instance it was fortunately possible to detect the problem early on, and to understand its causes – and the program was never put to use. But it is not always so easy to understand what is going wrong.
We need to know where mistakes happen
Since computers learn by fine-tuning some mostly meaningless numbers, it is in principle impossible to understand the reasons behind the answers they give.
And this poses a huge ethical problem. It would simply not be acceptable to tell someone that their loved one has died because the computer gave a wrong answer for some unknown reason.
Deep learning has had a huge success in many areas. However, these concerns have motivated researchers to look for alternative ways of developing systems to perform sensitive tasks.
The most popular alternatives fall in two categories:
- programs that can explain their answers (explainable AI);
- programs that cannot make mistakes (certified programs).
The idea behind both alternatives is that knowledge is encoded directly in the computer program.
Instead of showing the computer examples and letting it figure out what is right and what is wrong, we teach it instead how to reason about the problem – for example, how a human would decide on a flight route, or how a doctor diagnoses a particular disease.
'You owe me an explanation, computer!'
In other words: instead of teaching the computer like a child (via examples), we teach it as an adult (with explanations and methods). Furthermore, information is organized in such a way that the computer can explain its thinking process when it gives an answer.
If we are in doubt as to whether the answer is correct, we now have the possibility of analyzing the explanation to understand whether the computer has noticed something that we overlooked. We can also discover that there is an error in the computer’s program – and we have enough information that we can fix this error.
Ideally, we could also prove that there are no errors in the program. In this context, we do not mean that we have tested the program without finding problems: we mean that we actually have a mathematical proof of the program's properties.
A logical language
In order to do this, we first write down the properties we desire (the plain will not crash, the car will not run over the old lady, the patient will not get too much medicine or a wrong diagnosis) in a particular language.
We call such a language a logic, and the ‘sentences’ that describe these properties are called formulas.
We also make a model of the programming language using logic. This means that we have formulas expressing how the program works – for example, if the program saves the value 5 in a variable x and later reads x's value, then it is guaranteed to get 5.
The next step is to use logical tools to prove that the properties we described hold.
In this step, we also use specialized computer programs that help in checking proofs – and, in some cases, take care of some steps of the proofs without help.
How do we know that these programs do not make mistakes? Because we have also proved (mathematically) that they are error-free.
Typically, there is a small part of the program that has been proven correct on paper; afterwards, this small part is used to show that the rest of the program is also correct.
It becomes extremely complicated
This process can become immensely complex, since we can make a long chain of programs where each one checks the previous one.
It is not unusual to have three or four levels, where program A is shown to be error-free by program B, which is shown to be error-free by program C, which...
Just to give an idea of how large this problem can become: the list of all properties required from the software used in air traffic control fills around 1,000 pages; and some of the largest proofs currently produced by computers require more than 1 petabyte (1 million gigabytes) to write down.
Sometimes it is simply not possible to prove all the properties one wishes to be true. Therefore it is always a good idea that the program returns an explanation of its answers.
Each solution has its weak spots
In some cases, it can also happen that we cannot certify the program that computes answers, but that we can still guarantee that the answers are correct by checking the explanations.
And yes, this can also be done by yet another computer program.
In this process we use two programs: one to find answers and explanations, which possibly makes mistakes once in a while; and another to check the explanations and tell us whether the answers are correct – and this program has been proved not to make mistakes.
Deep learning, explainable AI and certified programs are very different ways to produce computer programs that can solve complicated problems, and each has its advantages and disadvantages.
Deep learning has produced computer programs that perform better than humans in many areas (is it a bird or not?), but we can find ourselves in an ethical mess if they happen to give wrong answers (medical diagnosis).
Certified programs cannot make mistakes, but writing down and proving all properties that they should respect can be a huge (or impossible) task.
Explainable AI requires instead that we always analyze explanations in order to decide whether or not to trust an answer.
The best would be a combination
The best scenario is one where the different methods are combined based on how effective they are, or how important it is that mistakes do not happen.
In medical diagnoses, the best results are obtained by using both a computer program and a human doctor.
When the doctor and the computer disagree, the doctor can look through the computers explanation to understand its reasoning - and decide who is right.
In this way, mistakes become very unlikely – they can only happen in both the doctor and the computer independently arrive at the same wrong conclusion (or if the doctor's analysis of who is right is flawed).
On the other hand, since we can prove the most important properties of air traffic control systems, we do not need to check each individual answer. And an app that looks at a picture of a flower and tells us what kind of flower it is might well be allowed to make mistakes once in a while.
Therefore we should continue developing these different techniques in order to get the best results, rather than focusing on only one of them.
Read this article in Danish at Videnskab.dk's Forskerzonen.