Home XAI 4 - LIME for Model Interpretability
Post
Cancel

XAI 4 - LIME for Model Interpretability

[XAI 4] - LIME for Model Interpretability

I. What is LIME?

Local Interpretable Model-Agnostic Explanations (LIME) is a novel, model-agnostic, local explanation technique used for interpreting black-box models by learning a local model around the predictions. LIME provides an intuitive global understanding of the model, which is helpful for non-expert users, too. The technique was first proposed in the research paper β€œWhy Should I Trust You?” Explaining the Predictions of Any Classifier. The Python library can be installed from the GitHub repository. The algorithm does a pretty good job of interpreting any classifier or regressor in faithful ways by using approximated local interpretable models. It provides a global perspective to establish trust for any black-box model; therefore, it allows you to identify interpretable models over human-interpretable representation, which is locally faithful to the algorithm. So, it mainly functions by learning interpretable data representations , maintaining a balance in a fidelity-interpretability trade-off , and searching for local explorations . Let’s look at each one of them in detail.

1. Learning interpretable data representations

LIME does a pretty good job in differentiating between impactful features and choosing interpretable data representations that are understandable to any non-expert user regardless of the actual complex features used by the algorithm. For example, when explaining models trained on unstructured data such as images, the actual algorithm might use complex numerical feature vectors for its decision-making process, but these numerical feature values are incomprehensible to any non-technical end user. In comparison, if the explainability is provided in terms of the presence or absence of a region of interest or superpixel (that is, a continuous patch of pixels) within the image, that is a human-interpretable way of providing explainability.

Similarly, for text data, instead of using word-embedding vector values to interpret models, a better way to provide a human-interpretable explanation is by using examples of the presence or absence of certain words used to describe the target outcome of the model. So, mathematically speaking, the original representation of a data instance being explained is denoted by $π‘₯ πœ– 𝑅^𝑑$, where $d$ is the entire dimension of data. A binary vector of interpretable data representations is mathematically denoted by $π‘₯β€²πœ– {0,1}^𝑑′$. Intuitively speaking, the algorithm tries to denote the presence or absence of human-interpretable data representations to explain any black-box model.

Figure below shows how LIME tries to divide the input image data into human-interpretable components that are later used to explain black-box models in a manner that is understandable to any non-technical user:

image

Next, let’s discuss how to maintain the fidelity-interpretability trade-off.

2. Maintaining a balance in the fidelity-interpretability trade-off

LIME makes use of inherently interpretable models such as decision trees, linear models, and rule-based heuristic models to provide explanations to non-expert users with visual or textual artifacts. Mathematically speaking, this explanation is a model that can be denoted by $𝑔 ∈ 𝐺$, where $𝐺$ is the entire set of potentially interpretable models and the domain of $𝑔$ is represented with another binary vector, ${0,1}^𝑑′$, which represents the presence or absence of interpretable components. Additionally, the algorithm tries to measure the complexity of an explanation along with its interpretability . For example, even in interpretable models such as decision trees, the depth of the tree is a measure of its complexity,

Mathematically speaking, the complexity of an interpretable model is denoted by $Ξ©(𝑔)$. LIME tries to maintain local fidelity while providing explanations. This means that the algorithm tries to replicate the behavior of the model in proximity to the individual data instance being predicted. So, mathematically, the inventors of this algorithm used a function, $πœ‹_π‘₯(𝑧)$, to measure the proximity between any data instances, $𝑧$, thus defining the locality around the original representation, $π‘₯$. Now, if the probability function, $f(x)$, defines the probability that $π‘₯$ belongs to a certain class, then to approximate $𝑓$, the LIME algorithm tries to measure how unfaithful $𝑔$ is with a proximity function, $πœ‹_π‘₯$. This entire operation is denoted by the $𝐿(𝑓, 𝑔, πœ‹_π‘₯)$ function. Therefore, the algorithm tries to minimize the localityaware loss function, $𝐿(𝑓, 𝑔, πœ‹_π‘₯)$, while maintaining $𝛺(𝑔)$ to be a low value. This is so that it is easily explainable to any non-expert user. The measure of an interpretability local fidelity trade-off is approximated by the following mathematical function:

image

Hence, this trade-off measure depends on the interpretable models, $𝐺$, the fidelity function, $𝐿$, and the complexity measure, $Ω$.

3. Searching for local explorations

The LIME algorithm is model-agnostic . This means when we try to minimize the locality-aware loss function, $𝐿(𝑓, 𝑔, πœ‹_π‘₯)$, without any assumption about $f$. Also, LIME maintains local fidelity by taking samples that are weighted by $πœ‹_π‘₯$ while approximating $𝐿(𝑓, 𝑔, πœ‹_π‘₯)$. Nonzero samples of $𝑋′$ are drawn uniformly at random to sample instances around $𝑋′$. Let’s suppose there is a perturbed sample containing fractions of nonzero elements of $𝑋′$, which is denoted by $π‘§β€²πœ– {0,1}^𝑑′$. The algorithm tries to recover samples from the original representation, $𝑧 πœ– 𝑅^𝑑$, to approximate $𝑓(π‘₯)$. Then, $𝑓(π‘₯)$ is used as a label for the explanation model, $πœ‰(π‘₯)$.

Figure below represents an example presented in the original paper of the LIME framework, which intuitively explains the working of the algorithm using a visual representation:

image

In Figure above, the curve separating the light blue and pink backgrounds is considered a complex $𝑓$ decision function of a black-box model. Since the decision function is not linear, approximating it using linear models is not efficient. The crosses and the dots represent training data belonging to two different classes. The bold cross represents the inference data instance being explained. The algorithm functions by sampling instances to get predictions using $f$. Then, the algorithm assigns weight by the proximity to the data instance being explained. In the preceding diagram, based on the proximity of the data instance, the sizes of the red crosses and blue dots are varied. So, the instances that are sampled are both in closer proximity to $x$, having a higher weight from $πœ‹_π‘₯$, and far away from it, thus having a lower weight of $πœ‹_π‘₯$. The original black-box model might be too complex to provide a global explanation, but the LIME framework can provide explanations that are appropriate for the local data instance, $πœ‹_π‘₯$. The learned explanation is illustrated by the dashed line, which is locally faithful with a global perspective.

Figure below illustrates a far more intuitive understanding of the LIME algorithm. From the original image, the algorithm generates a set of perturbed data instances:

image

The perturbed instances, as shown in Figure above, are created by switching some of the interpretable components off. In the case of images, as shown in the preceding diagram, it is done by turning certain components gray. Then, the black-box model is applied to each of the perturbed instances that are generated, and the probability of the instance being predicted as the final outcome of the model is calculated. Then, an interpretable model (such as a simple locally weighted linear model) is learned on the dataset, and finally, the superpixels having the maximum positive weights are considered for the final explanation. In the next section, let’s discuss why LIME is a good model explainer.

II. What makes LIME a good model explainer?

LIME enables non-expert users to understand the working of untrustworthy black-box models. The following properties of LIME make it a good model explainer:

  • Human interpretable : As discussed in the previous section, LIME provides explanations that are easy to understand, as it provides a qualitative way to compare the components of the input data with the model outcome.

  • Model-agnostic : In the previous chapters, although you have learned about various model-specific explanation methods, it is always an advantage if the explanation method can be used to provide explainability for any black-box model. LIME does not make any assumptions about the model while providing the explanations and can work with any model.
  • Local fidelity : LIME tries to replicate the behavior of the entire model by exploring the proximity of the data instance being predicted. So, it provides local explainability to the data instance being used for prediction. This is important for any non-technical user to understand the exact reason for the model’s decisionmaking process.
  • Global intuition : Although the algorithm provides local explainability, it does try to explain a representative set to the end users, thereby providing a global perspective to the functioning of the model. SP-LIME provides a global understanding of the model by explaining a collection of data instances. This will be covered in more detail in the next section.

Now that we understand the key advantages of the LIME framework, in the next section, let’s discuss the submodular pick algorithm of LIME, which is used for extracting global explainability.

III. SP-LIME

In order to make explanation methods more trustworthy, providing an explanation to a single data instance (that is, a local explanation) is not always sufficient, and the end user might want a global understanding of the model to have higher reliability on the robustness of the model. So, the SP-LIME algorithm tries to run the explanations on multiple diverse, yet carefully selected, sets of instances and returns non-redundant explanations.

Now, let me provide an intuitive understanding of the SP-LIME algorithm. The algorithm considers that the time required to go through all the individual local explanations is limited and is a constraint. So, the number of explanations that the end users are willing to examine to explain a model is the budget of the algorithm denoted by B. Let’s suppose that X denotes the set of instances; the task of selecting B instances for the end user to analyze for model explainability is defined as the pick step . The pick step is independent of the existence of the explanation and it needs to provide non-redundant explanations by picking up a diverse representative set of instances to explain how the model is behaving considering a global perspective. Therefore, the algorithm tries to avoid picking up instances with similar explanations.

Mathematically, this idea is represented using the Explanation Matrix (W), in which $W = n * d’$, such that $n$ is the number of samples and $d’$ is the human interpretable features. The algorithm also uses a Global importance component matrix (I) , in which for each component of $j$, $I(j)$ represent the global importance in the explanation space. Intuitively speaking, $I$ is formulated in a way to assign higher scores to features, which explains many instances of the data. The set of important features that are considered for the explanations is denoted by $V$. So, combining all these parameters, the algorithm tries to learn a non-redundant coverage intuition function , $c(V,W,I)$. The non-redundant coverage intuition tries to compute the collective importance of all features that appear in at least one instance in set $V$. However, the pick problem is about maximizing the weighted coverage function. This is denoted by the following equation:

image

The details about the algorithm that we just covered in this section might be slightly overwhelming to understand for certain readers. However, intuitively, the algorithm tries to cover the following steps:

  1. The explanation model is run on all instances (x).

  2. The global importance of all individual components is computed.

  3. Then, the algorithm tries to maximize the non-redundant coverage intuition function (c) by iteratively adding instances with the highest maximum coverage gain.

  4. Finally, the algorithm tries to obtain the representative non-redundant explanation set (V) and return it.

In the next section, we will cover how the LIME Python framework can be used for classification problems using code examples.

IV. LIME for classification problems

Notebook

V. Potential pitfalls

In the previous section, we learned how easily the LIME Python framework can be used to explain black-box models for a classification problem. But unfortunately, the algorithm does have certain limitations, and there are a few scenarios in which the algorithm is not effective:

  • While providing interpretable explanations, a particular choice of interpretable data representation and interpretable model might still have a lot of limitations. While the underlying trained model might still be considered a black-box model, there is no assumption about the model that is made during the explanation process. However, certain representations are not powerful enough to represent some complex behaviors of the model. For example, if we are trying to build an image classifier to distinguish between black and white images and colored images, then the presence or absence of superpixels will not be useful to provide the explanations.

  • As discussed earlier, LIME learns an interpretable model to provide local explanations. Usually, these interpretable models are linear and non-complex. However, suppose that the underlying black-box model is not linear, even in the locality of the prediction, so the LIME algorithm is not effective.

  • LIME explanations are highly sensitive to any change in input data. Even a slight change in the input data can drastically alter the explanation instance provided by LIME.

  • For certain datasets, LIME explanations are not robust as, even for similar data instances, the explanations provided can be completely different. This might prevent end users from completely relying on the explanations provided by LIME.

  • The algorithm is extremely prone to data drifts and label drifts. A slight drift between the training and the inference data can completely produce inconsistent explanations. The authors of the paper named A study of data and label shift in the LIME framework, Rahnama and BostrΓΆm, mention certain experiments that can be used to evaluate the impact of data drift in the LIME framework. Due to this limitation, the goodness of approximation of the LIME explanations (also referred to as fidelity) is considered to be low. This is not expected in a good explanation method.

  • Explanations provided by LIME depend on the choice of the hyperparameters of the algorithm. Similar to most of the algorithms, even for the LIME algorithm, the choice of the hyperparameters can determine the quality of the explanations provided. Hyperparameter tuning is also difficult for the LIME algorithm as, usually, qualitative methods are adopted to evaluate the quality of the LIME explanations.

There are many research works that indicate the other limitations of the LIME algorithm. I have mentioned some of these research works in the References section. I would strongly recommend that you go through those papers to get more details about certain limitations of the algorithm.

This post is licensed under CC BY 4.0 by the author.