interpretability

Interpretability

Definition of 'interpretability'

Interpretability in Artificial Intelligence is the ability to explain or present something (e.g. a model or prediction made by a machine) in human-understandable terms. The level of interpretability is the degree to which a human can understand the cause of a decision and the degree to which a human can consistently predict the results of a model. Interpretability is different from 'explainability
' in Artificial Intelligence as explainability
answers a why-question. In machine learning, explainability
is used to provide explanations of predictions by black-box models. However, this does not mean that the black-box model is interpretable: a layperson still does not understand how this model works. While explanations can be used for model interpretability, interpretable models generate their own explanations as they are intrinsically understandable by humans.

Another closely related concept to interpretability is 'transparency'. While interpretability is defined by how much a person understands a decision or a model, transparency does not take this demand into account. Instead, transparency is defined by the degree to which something is able to be viewed by people. A programmer can make a model completely transparent, for example by showing the complete code of a model to people, however, this still does not mean that a layperson can interpret the model as well.

Implications of commitment to interpretability

The key requirement for the appropriate design of AI models concerning the concept interpretability is that we design intrinsic interpretable, model-specific models. These are models that are understandable (in human terms) because of the way they are, not by post-hoc explanations. Also, the model is model-specific as interpretability focuses on the internals of the model (e.g. the weights of the models). This means that the explanation of the model should focus on the inner workings of the model and is therefore not generalizable to other models. Examples of such intrinsically interpretable and model-specific models are a shallow decision tree or a sparse linear model.

Societal transformations required for addressing concern raised by interpretability

Machine learning models are especially problematic for interpretability. Machine learning models are based on training data, which often contains biases. Without interpretability these biases might not be detected as people do not understand how the model came to the decisions. Often, programmers prefer complex models with post-hoc explanations over interpretable models to use the maximum capabilities of the technique and therefore hoping to reach maximum performance by the machine. However, often a simple intrinsically-interpretable model does not perform much worse. Especially for high-stakes decisions, for example, a decision regarding the acceptance for a school or the granting of a loan, a thorough analysis of which model to use should be carried out before deciding which one to use. These high-stakes decisions should contain the minimal amount of biased decisions possible.