(Feature Influence)
ME captures the average response of a predictive model across a collection of instances (taken from a designated data set) for a specific value of a selected feature (found in the aforementioned data set) (Apley and Zhu 2020). This measure can be relaxed by including similar feature values determined by a fixed interval around the selected value.
It communicates global (with respect to the entire explained model) feature influence.
ME improves upon Partial Dependence (PD) (Friedman 2001) by ensuring that the influence estimates are based on realistic instances (thus respecting feature correlation), making the explanatory insights more truthful.
Property | Marginal Effect |
---|---|
relation | post-hoc |
compatibility | model-agnostic |
modelling | regression, crisp and probabilistic classification |
scope | global (per data set; generalises to cohort) |
target | model (set of predictions) |
Property | Marginal Effect |
---|---|
data | tabular |
features | numerical and categorical |
explanation | feature influence (visualisation) |
caveats | feature correlation, heterogeneous model response |
\[ X_{\mathit{ME}} \subseteq \mathcal{X} \]
\[ V_i = \{ x_i : x \in X_{\mathit{ME}} \} \]
\[ \mathit{ME}_i = \mathbb{E}_{X_{\setminus i} | X_{i}} \left[ f \left( X_{\setminus i} , X_{i} \right) | X_{i}=v_i \right] = \int_{X_{\setminus i}} f \left( X_{\setminus i} , x_i \right) \; d \mathbb{P} ( X_{\setminus i} | X_i = v_i ) \;\; \forall \; v_i \in V_i \]
\[ \mathit{ME}_i = \mathbb{E}_{X_{\setminus i} | X_{i}} \left[ f \left( X_{\setminus i} , X_{i} \right) | X_{i}=V_i \right] = \int_{X_{\setminus i}} f \left( X_{\setminus i} , x_i \right) \; d \mathbb{P} ( X_{\setminus i} | X_i = V_i ) \]
Based on the ICE notation (Goldstein et al. 2015)
\[ \left\{ \left( x_{S}^{(i)} , x_{C}^{(i)} \right) \right\}_{i=1}^N \]
\[ \hat{f}_S = \mathbb{E}_{X_{C} | X_S} \left[ \hat{f} \left( X_{S} , X_{C} \right) | X_S = x_S \right] = \int_{X_C} \hat{f} \left( x_{S} , X_{C} \right) \; d \mathbb{P} ( X_{C} | X_S = x_S ) \]
\[ \mathit{ME}_i \approx \frac{1}{\sum_{x \in X_{\mathit{ME}}} \mathbb{1} (x_i = v_i)} \sum_{x \in X_{\mathit{ME}}} f \left( x | x_i=v_i \right) \]
Measures ME for a range of values \(v_i \pm \delta\) around a selected value \(v_i\), instead of doing so precisely at that point.
\[ \mathit{ME}_i^{\pm\delta} = \mathbb{E}_{X_{\setminus i} | X_{i}} \left[ f \left( X_{\setminus i} , X_{i} \right) | X_{i}=v_i \pm \delta \right] = \int_{X_{\setminus i}} f \left( X_{\setminus i} , x_i \right) \; d \mathbb{P} ( X_{\setminus i} | X_i = v_i \pm \delta ) \;\; \forall \; v_i \in V_i \]
or
\[ \hat{f}_S^{\pm\delta} = \mathbb{E}_{X_{C} | X_S} \left[ \hat{f} \left( X_{S} , X_{C} \right) | X_S = x_S \pm \delta \right] = \int_{X_C} \hat{f} \left( x_{S} , X_{C} \right) \; d \mathbb{P} ( X_{C} | X_S = x_S \pm \delta ) \]
An evolved version of (relaxed) ME that is less prone to being affected by feature correlation. It communicates the influence of a specific feature value on the model’s prediction by quantifying the average (accumulated) difference between the predictions at the boundaries of a (small) fixed interval around the selected feature value (Apley and Zhu 2020). It is calculated by replacing the value of the explained feature with the interval boundaries for instances found in the designated data set whose value of this feature is within the specified range.
It communicates the influence of a specific feature value on the model’s prediction by fixing the value of this feature across a designated range for a selected data point (Goldstein et al. 2015). It is an instance-focused (local) “variant” of Partial Dependence.
It communicates the average influence of a specific feature value on the model’s prediction by fixing the value of this feature across a designated range for a set of instances. It is a model-focused (global) “variant” of Individual Conditional Expectation, which is calculated by averaging ICE across a collection of data points (Friedman 2001).
Python | R |
---|---|
N/A | DALEX |