Can trust issues be solved by Explainable AI?
One could argue that AI is perfectly explainable: if you look at the details of a machine learning algorithm, you can see exactly how it works. The weights of a neural network or the rules in a decision tree allow that. They will tell you that you didn’t get a loan because your age divided by your income was bigger than twice the duration of your last job. This shows that it can be hard to really comprehend an explanation in order to gain trust in how an AI model works. AI models use machine learning to extract complex regularities from examples and are not explicitly programmed using reasoning from a human perspective. The complexity is inherent to AI’s strength and, ironically, also to its weakness.
Unfortunately, explaining how algorithms work fails to solve the underlying trust problem. AI can still behave in unexpected ways if people don’t grasp how it thinks.
In the writing group for ISO/IEC standard 5338, we’ve been trying to pinpoint this explainability aspect. The term ‘unpredictable’ is inaccurate because we can predict what it will decide given a certain input. Likewise, ‘opaque’ is imprecise because we can see what it does, plus it doesn’t communicate the unexpected behavior element. After a long discussion exploring many variables we landed upon ‘incomprehensible’.
Because AI is enigmatic, we need an explanation to understand the decisions it has made. When the ‘computer says no’ it is hard to accept and trust the decision because we know it can be hard to create an accurate and fair model.
So, how can trust be gained if an explanation doesn’t suffice?
One proposed solution to gain trust is to train a more readable and simple surrogate model using the input and output of an advanced model. This could build trust by looking at what decisions are made for what groups, but it will still be hard to comprehend since human reasoning and concepts are missing, plus it will not accurately represent the advanced model behavior. The surrogate approach can be taken a step further by trying to have a more simple and readable model altogether, which can be successful in gaining trust, provided that it is comprehensible and the typical compromise in accuracy is acceptable.
Another solution is to look at what part of the input matters the most in AI decisions through sensitivity analysis. This can point out that the model is looking at the wrong things and build a deeper level of trust as we can see whether we agree with the model’s prioritization of input variables.
Now that we have discussed several ways to somewhat attain explainable AI, it is safe to say that an AI model is not a black box in theory, but in practice, it is – even when it is explained. Thankfully, there are several explainable AI techniques that help to gain some level of trust.
What if that is not enough?
An alternative solution is to skip the comprehension and perform proper testing: having a representative test set and using the right metrics to measure how well the model performs. Performance can be for example correctness and fairness for which there are various metrics available. This can provide more trust than trying to understand how an incomprehensible model works.
Still, testing of models also has its limitations when it comes to gaining trust. For AI systems, certain risks need to be addressed in the software and in the engineering processes. For example:
- protecting training data against poisoning attacks and data leaks,
- managing complex AI supply chain risks (AI bill of materials),
- continuously checking for unwanted bias and staleness of data,
- code quality control to prevent vulnerabilities and other risks, and
- logging – to name but a few.
Therefore, a good way to gain trust is to assess the software (e.g. code review) and the engineering processes, to see if the necessary best practices are applied and that quality is built in.
An additional question behind wanting an explanation is to understand what needs to change in the input to get a different model decision, e.g. what you need to change to get a loan. This too can be done using sensitivity analysis: altering the input to discover what influences the model’s decision. No comprehension of the model’s inner workings is necessary.
How about heuristic models?
Now that we covered machine learning as the dominant form of AI, what about the other type of AI: heuristic models that consist of human-made rules: how explainable are they?
The advantage of heuristic models is that they are easier to validate since the individual rules can be understood, and an outcome can be explained by sharing the ruleset. Still, they are known to behave in unexpected ways because even though individual rules make sense, they can lead to surprising results when combined. For example, a traffic control system that prioritizes emergency vehicles may cause unexpected traffic jams. So, for heuristic models, it is still worthwhile to perform proper testing in order to gain trust.
Is the incomprehensible aspect applicable to only AI?
One could argue that complex non-AI systems with a long history can be hard to comprehend and behave unexpectedly. In other words: explanation is helpful for any complex system but isn’t the silver bullet for gaining trust.
The content of this blog will be added to the OWASP AI security & privacy guide at https://owasp.org/www-project-ai-security-and-privacy-guide/