XAI - Master Thesis TU Berlin 2020
Abstract
Even with their advantageous high accuracy, many users are reluctant to trust ML
models in critical situations because of their opaqueness. Fortunately, recent interpretability
techniques have allowed faithful explanations of complex models, giving a
user the possibility to understand the reasoning of his model. This field is known as
explainable AI, interpretable ML, or XAI. It not only allows to understand the underlying
logic of the model or its individual predictions, but also aims to detect flaws and
biases, gain new insights into the problem, verify the correctness of the predictions, and
finally improve or correct the model itself. Moreover, emerging regulations have made
mandatory the audit and verifiability of decisions made by ML or AI systems, increasing
the demand for explainability and the ability to question decision systems. The research
community has identified this interpretability problem and has developed theories and
methods to address it, with technical contributions being the main focus. Thus, there
are still some important questions that still need to be addressed in the conceptual part.
For instance, a formal definition of interpretability has not been agreed upon yet. It has
now become crucial to reach a consensus on a proper definition of explainability in the
AI context. How to assess its quality is another aspect that is becoming more and more
important to properly advance in the field. Answers to these questions remain vague
in the sense that different metrics are needed for different use-cases and different users.
Hence, amidst all these techniques and metrics to evaluate them, it remains hard for
a user to make sense of which explanation technique is mostly aligned with his understanding
and suitable for his use case. Agreeing upon a definition of explainability, and
its quantitative evaluation metrics will significantly contribute toward an improvement
in developing new, efficient, and trusted models and explainability methods.
In this work, we implement a proof of concept of the idea that interpretability cannot be broadly
defined or generalized for all humans. It remains a polylithic concept different for every
user. Furthermore, we demonstrate that clustering users depending on their expertise
allows us to reach a good compromise in the trade-off between giving the most suitable
explanation to each different user and giving the overall best explanation to all users. By
finding a pattern between the preferences of every type of profile, we managed to distinguish
explanation features -criteria- that are important to each one of them. Therefore,
this work can be extended to other fields -and other profiles- in order to enhance the
users' understanding of the explanation, their satisfaction, and trust of decision systems.
Keywords
Explainable Artificial Intelligence, Interpretable Machine Learning, Deep
Learning, Interpretability, Comprehensibility, Explainability, Black-box models, Posthoc
interpretability.