Black-box vs. white-box models

Black-box vs.

white-box modelsLars HulstaertBlockedUnblockFollowFollowingMar 14Most machine learning systems require the ability to explain to stakeholders why certain predictions are made.

When choosing a suitable machine learning model, we often think in terms of the accuracy vs.

interpretability trade-off:accurate and ‘black-box’: Black-box models such as neural networks, gradient boosting models or complicated ensembles often provide great accuracy.

The inner workings of these models are harder to understand and they don’t provide an estimate of the importance of each feature on the model predictions, nor is it easy to understand how the different features interact.

weaker and ‘white-box’: Simpler models such as linear regression and decision trees on the other hand provide less predictive capacity and are not always capable of modelling the inherent complexity of the dataset (i.

e.

feature interactions).

They are however significantly easier to explain and interpret.

Image from Applied.

AIThe accuracy vs.

interpretability trade-off is based on a important assumption, namely that ‘explainability is an inherent property of the model’.

 I strongly believe however that with the right ‘interpretability techniques’, any machine learning model can be made more interpretable, albeit at a complexity and cost which is higher for some models than others.

In this blog post, I will discuss some of the different techniques that can be used to interpret machine learning models.

The structure and content of this blog post is highly based on the H20.

ai booklet on Machine Learning Interpretability.

I highly recommend everyone to read the H20.

ai booklet or other material by Patrick Hall if you want to learn more!Model propertiesThe degree of explainability of a model is often linked to two properties of the response function.

The response function f(x) of a model defines the input-output relationship between a the input (features x) and the output (target f(x)) of a model.

Depending on the machine learning model, this function has the following traits:Linearity: In a linear response function, the association between a feature and the target behaves linearly.

If a feature changes linearly, we also expect the target to change linearly at a similar rate.

Monotonicity: In a monotonic response function, the relationship between a feature and the target is always in one direction (increase or decrease) over the feature.

More importantly, this relationship holds over the entire feature domain and is independent of the other feature variables.

An example of a simple linear and monotic response function (1 input variable x, 1 response variable y)Examples of linear and monotonic functions are e.

g.

linear regression models, whereas e.

g.

random forests and neural networks are examples of models that exhibit highly non-linear and non-monotonic response functions.

The following slide by Patrick Hall illustrates why white-box models (with linear and monotonic functions) are often preferred when clear and simple model explanations are required.

The top plot shows that the number of purchases increases when the age increases.

The response function has a linear and monotonic relationship at a global level, readily interpretable by all stakeholders.

A significant portion of the trend is however missed due to the linear and monotonic constraints of the white-box model.

By exploring more complex machine learning models, it is possible to better fit the observed data, although the response function is only linear and monotonic at a local level.

In order to interpret the models behaviour, it is necessary to investigate the model at a local level.

The scope of model interpretability, i.

e.

at a global or local level, is inherently linked to the complexity of the model.

Linear models exhibit the same behaviour across the entire feature space (as seen in the top plot) and they are thus globally interpretable.

The relationship between the input and output is often limited in complexity and local interpretations (i.

e.

why does a model make a certain prediction at a certain data point?) default to global interpretations.

For more complex models, the global behaviour of the model is harder to define and local interpretations of small regions of the response functions are required.

These small regions are more likely to behave linear and monotonic, enabling a more accurate class of explanations.

ML libraries (e.

g.

sk-learn) allow quick comparisons between different classifiers.

When the dataset is limited in size and dimensionality, it is possible to interpret the results.

In most real-life problems this is not the case anymore.

In the remainder of this blog post, I will focus on two model-agnostic techniques that provide both global and local explanations.

These techniques can be applied to any machine learning algorithm and they enable interpretability by analysing the response function of the machine learning model.

Interpretability techniquesSurrogate modelsSurrogate models are (generally simpler) models that are used to explain a more complex model.

Linear models and decision tree models are often used because of their simple interpretation.

The surrogate model is created to represent the decision making process of the complex model (the response function) and is a model trained on the input and model predictions, rather than input and targets.

 Surrogate models provide a layer of global interpretability on top of non-linear and non-monotonic models, but they should not be relied on exclusively.

Surrogate models are not able to perfectly represent the underlying response function, nor are they capable of capturing the complex feature relationships.

They serve primarily as ‘global summary’ of a model.

The following steps illustrate how you can build a surrogate model for any black-box model.

Train a black-box model.

Evaluate the black box model on the dataset.

Choose an interpretable surrogate model (typically a linear model or a decision tree).

Train the interpretable model on the dataset and its predictions.

Determine the surrogate model error measure and interpret the surrogate model.

LIMEThe general idea behind LIME is the same as surrogate models.

LIME however does not build a global surrogate model that represents the entire dataset and only builds local surrogate models (linear models) that explain the predictions at local regions.

A more in-depth explanation of LIME can be found at this blog post on LIME.

LIME provides an intuitive way to interpret model predictions for a given data point.

The following steps illustrate how you can build a LIME model for any black-box model.

Train a black-box model.

Sample points of the local region of interest.

Points can be retrieved from the dataset, or artificial points can be generated.

Weight the new samples by their proximity to the region of interest.

Fit a weighted, interpretable (surrogate) model on the dataset with the variations.

Interpret the local surrogate model.

ConclusionThere are several different techniques that you can use to improve the interpretability of your machine learning models.

Although the techniques are becoming more robust as the field improves, it is important to always compare different techniques.

A technique I didn’t discuss are Shapley Values.

Have a look at Christoph Molnar’s book on ‘Interpretable Machine Learning’ to learn more about that (and other) techniques!If you have any questions on interpretability in machine learning, I’ll be happy to read them in the comments.

Follow me on Medium or Twitter if you want to receive updates on my blog posts!.

. More details

Leave a Reply