Understanding how LIME explains predictions

LIME produces an explanation by approximating the black-box model by an interpretable model (for example, a linear model with a few non-zero coefficients) in the neighborhood of the instance we want to explain.In summary, LIME generates an explanation for a prediction from the components of an interpretable model (for instance, the coefficients in a linear regression) which resembles the black-box model at the vicinity of the point of interest and which is trained over a new data representation to ensure interpretability.Interpretable representationTo ensure that the explanation is interpretable, LIME distinguishes an interpretable representation from the original feature space that the model uses..The interpretable representation has to be understandable to humans, so its dimension is not necessarily the same as the dimension of the original feature space.Let p be the dimension of the original feature space X and let p’ be the dimension of the interpretable space X’..The interpretable inputs map to the original inputs through a mapping function hʸ: X’→X, specific to the instance we want to explain y∈ X.Different types of mappings are used for different input spaces:For text data, a possible interpretable representation is a binary vector indicating the presence or absence of a word, although the classifier may use more complex and incomprehensible features such as word embeddings..Formally, X’={0,1}ᵖ’≡ {0,1}× ⋅⋅⋅ × {0,1} where p’ is the number of words that contains the instance being explained and the mapping function converts a vector of 1’s or 0’s (presence or absence of a word, respectively) into the representations used by the model: if it uses word counts, the mapping will map 1 to the original word count and 0 to 0; but if the model uses word embeddings, the mapping should convert any sentence expressed as a vector of 1’s and 0’s into its embedded version.For images, a possible interpretable representation is a binary vector indicating the presence or absence of a set of contiguous similar pixels (also called super pixel)..Formally, X’={0,1}ᵖ’ where p’ is the number of super pixels considered, usually obtained by an image segmentation algorithm such as quick shift..In this case, the mapping function maps 1 to leaving the super pixel as in the original image and 0 to gray the super pixel out (which represents being missing).Original representation (left) and interpretable representation (right) of an image..Sets of contiguous similar pixels (delimited by yellow lines) are called super pixels..Each super pixel defines one interpretable feature..Source: https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-limeFor tabular data (i.e., matrices), the interpretable representation depends on the type of features: categorical, numerical or mixed data..For categorical data, X’={0,1}ᵖ where p is the actual number of features used by the model (i.e., p’=p) and the mapping function maps 1 to the original class of the instance and 0 to a different one sampled according to the distribution of training data..For numerical data, X’=X and the mapping function is the identity..However, we can discretize numerical features so that they can be considered categorical features.. More details

Leave a Reply