The Why of Explainable AI
Bahador Khaleghi Bahador Khaleghi
August 19 10 min

The Why of Explainable AI

Interest in Explainable AI research has been growing along with the capabilities and applications of modern AI systems. As AI makes its way to our daily lives, it becomes increasingly crucial for us to know how it works. Explainable AI has the potential to make AI models more trustworthy, compliant, performant, robust, and easier to develop. That can in turn drive business value and widen adoption of AI solutions.

Why do we care about XAI?

Imagine an AI system used for investing that provides seemingly counterintuitive advice when picking stocks. Or an AI-powered medical diagnosis system that predicts cancer in a patient previously diagnosed as healthy by medical experts. Or an autonomous vehicle that drives erratically and causes a fatal collision despite normal road conditions. In all these cases, it’s imperative to know why the AI system behaved in the way it did and how it came to its decisions.

These questions are the subject of the field of Explainable AI, or XAI. XAI is about explaining why and how an AI model came to a decision. Like AI itself, XAI isn’t a new domain of research, and recent advances in the theory and applications of AI have put new urgency behind efforts to explain it. According to an annual survey by PwC, the vast majority (82%) of CEOs agree that for AI-based decisions to be trusted, they must be explainable. In addition, Google trend results show a steady increase in worldwide interest in Explainable AI over the past few years.

This new interest in XAI is mainly driven by ever-increasing prevalence and complexity of modern AI systems, which are becoming more capable and finding their way to new industries and applications. These enhanced capabilities mean more complexity, and that makes these systems more difficult to understand. XAI could alleviate this situation by proposing novel ways of explaining the underlying thinking process of AI systems.

In spite of the growing interest in XAI research, the AI community does not agree on whether or not it’s an important field of study. Some researchers, including Facebook’s Chief AI Scientist Yann LeCun, suggest that rigorous testing is enough to provide an explanation — that you can infer a model’s reasoning by observing how the model acted in many different situations. More recently, Cassie Kozyrkov, Google’s Chief Decision Intelligence Engineer, made a similar case for why XAI will not deliver and supports thorough testing as an alternative.

Many researchers dispute these arguments, such as Microsoft Research’s Rich Caruana, who suggests that XAI is important when it comes to sensitive applications such as healthcare. Furthermore, these arguments against XAI assume that the reason for explaining AI models is only to establish trust in the model’s reasoning. They equate reliability, proven through rigorous testing, to trustability. If the model is predictable and reliable, there’s no need for an explanation. Yet this approach fails in practice. We often do not have comprehensive test datasets to perform rigorous evaluation in all cases. Imagine the dynamic environments in which autonomous vehicles work — it would be impossible to test every situation or variable — and the cost of collecting comprehensive test data can be prohibitive in applications such as healthcare.

In addition, trust is not the only reason we are interested in XAI. There are several factors motivating XAI research, with trust only the most obvious. Others include regulatory compliance, detecting bias, and protections against adversarial techniques.


Trust is the first, and perhaps most important, driver of interest in XAI. It is especially significant for applications involving high-stakes decision-making and those where rigorous testing isn’t feasible due to a lack of sufficient data or the complexity of comprehensive testing (as in the autonomous vehicle example above).

To trust model predictions in such applications, users need to ensure the predictions are produced for valid and appropriate reasons. This does not imply the goal of XAI is to fool human users into misplacing trust in a model. On the contrary, this means XAI must reveal the true limitations of an AI model so users know the bounds within which they can trust the model. This is particularly important as humans have been already shown to be prone to blindly trusting AI explanations. Some attribute this tendency to the so-called illusion of explanatory depth phenomenon, in which a person falsely assumes their high-level understanding of a complex system means that that they understand it nuances and intricacies. To avoid misplaced trust from users, the explanations provided by AI models must be truly reflective of how the model works, in a human-understable form, and presented using interfaces capable of communicating their limitations.

Regulatory compliance

Many companies have to comply with a set of regional, national, or international regulations, and more and more AI models are being integrated and turned into products. Accordingly, regulators are evolving to accommodate these new challenges. Some highly regulated industries, such as finance or insurance, are already mandated to provide explanations for predictions made by their models. For example, the Equal Credit Opportunity Act that governs the American financial industry states that “each applicant against whom adverse action is taken shall be entitled to a statement of reasons for such action from the creditor.” In addition, the recent enactment of the General Data Protection Regulation by the European Union has created a discussion around the right to explanation for AI-driven decision-making impacting EU citizens. To be compliant with such regulatory requirements, AI models must be developed with some notion of explainability in mind.


A key requirement for an effective AI model is the ability to generalize, that is, to perform well on samples it has not seen in its training. To establish the generalization abilities of a model, it is often not enough to examine only its validation performance. This is because models are fundamentally learning associative (not necessarily causal) patterns in training data. Thus, it is possible for a model to yield high validation performance while relying on spurious associations. Explanations of model predictions provide a way to identify such spurious correlations and thus gain a better sense of its generalization potential. Moreover, once in production, generalization performance of a model could change over time due to a phenomenon known as concept drift, which is a change in the relationship between input and output data of the underlying problem. When provided with explanations of model predictions, concept drift would be easier to detect and handle.

Debugging and enhancing AI models

Developing a performant AI model is an iterative process, where adjustments are made to incrementally improve performance. These adjustments are often guided by a process in which possible sources of model errors are identified and removed through careful examination and experimentation. Explanations can expedite this process by facilitating the identification of sources of model errors. For instance, influence functions identify the most influential training samples for a given model prediction. For invalid predictions, this approach could provide clues regarding problems with training datasets, e.g. mislabeled training samples.

In addition, explanations of the inner-workings of a model can enable the development of more efficient models. For example, the DeepEyes visual analytics system allows a detailed, per-layer analysis of deep convolutional networks to identify and eliminate dead filters, filters that are either always or rarely activated. Finally, model explanations have been shown to be directly applicable, as regularizers, during the training process. In particular, by constraining explanations to match known domain knowledge, the trained model yielded a higher generalization performance.

Debugging and enhancing AI models.
Identifying and eliminating dead filters of a deep convolutional model using the DeepEyes explanations. The dead filters are those that are either always activated or rarely activated.

Detecting bias

AI models are often used to fully or partially automate a decision-making process. These models are trained with historical data that may contain biases and/or prejudices in our society. Experts have warned that, if deployed blindly, these models can systematically reinforce our existing biases with devastating socio-economic consequences. Accordingly, the issue of bias and fairness in AI systems has received a lot of attention. Bias is a nuanced problem that can creep into various stages of AI pipeline, not merely model development. When it comes to detecting bias in AI models, XAI could be valuable. For instance, a common type of explanation is feature attribution, namely, determining the relative importance of input features for given model prediction(s). If a protected feature, e.g. gender or race, is found to have a high significance for model predictions, it could be said to be biased.

Hypothesizing about new knowledge

A decade ago, in the early days of the so-called Big Data era, there were claims that the scientific method was going obsolete. The premise was that instead of using the (often tedious) scientific method to theorize about our world, we could simply rely on our abundant observational data. In other words: with enough data, the numbers speak for themselves. More recently, however, the importance of having a theoretical understanding of what the numbers imply is resurfacing. Indeed, some argue that with data abundance, having a theory matters even more than before. Coming up with data-driven hypotheses that could lead to new theories remains challenging. This is where model explanations could be helpful.

Explanations of an AI model predictions are typically compared against established patterns, which are based on domain knowledge and/or simply common sense, to assess and validate the underlying thinking process of the model. On the other hand, in some empirical scientific applications, e.g. biology, and physics, such pre-existing knowledge may not necessarily be available. Model explanations can be used to generate scientific hypotheses. The generated hypotheses can then be evaluated, e.g. through experimentation, and if verified, result in new scientific discoveries. It is worth mentioning that stability of explanations is crucial in such applications.

XAI and adversarial machine learning

Adversarial attacks are processes where a model’s input is manipulated, often in a human imperceptible way, to make its prediction invalid. Adversarial defences, on the other hand, aim to protect against such attacks and achieve robust models. For AI models to be deployed in the wild they must be as robust as possible. This requirement has made adversarial machine learning a highly active research area.

Some argue that the two fields of XAI and adversarial machine learning are deeply related. There are several examples of using XAI methods both to perform attacks, e.g. crafting adversarial training samples, and to defend against attacks, e.g. developing detectors for adversarial samples. On the other hand, it has been shown that certain methods of making a model more robust can make it more interpretable as well. Some even argue that the main underlying reason for susceptibility of AI models to adversarial attacks is the existence of non-robust features: “features (derived from patterns in the data distribution) that are highly predictive, yet brittle and (thus) incomprehensible to humans.” In other words, robust models rely on robust feature representations that are often more human-aligned than regular representations.

XAI and adversarial machine learning.
Using explanations of model predictions in terms of training examples to engineer adversarial training samples that flip predictions of several test examples.

What’s next?

XAI is an important research area within the AI community. Explanations of AI models have the potential to make our AI systems more trustworthy, compliant, effective, fair, and robust, and that could drive adoption and business value.

In the next part (coming soon), we will explore what it really means to explain an AI model by reviewing some of the most notable literature.

Special thanks to Xavier Snelgrove, Elnaz Barshan, Lindsay Brin, Santiago Salcido, Manon Gruaz, Genevieve Merat, Simon Hudson, and Jean-Philippe Reid for valuable comments and illustrations. Edited by Peter Henderson.