(Feature Influence)
ICE captures the response of a predictive model for a single instance when varying one of its features (Goldstein et al. 2015).
It communicates local (with respect to a single instance) feature influence.
Property | Individual Conditional Expectation |
---|---|
relation | post-hoc |
compatibility | model-agnostic |
modelling | regression, crisp and probabilistic classification |
scope | local (per instance; generalises to cohort or global) |
target | prediction (generalises to model) |
Property | Individual Conditional Expectation |
---|---|
data | tabular |
features | numerical and categorical |
explanation | feature influence (visualisation) |
caveats | feature correlation, unrealistic instances |
\[ X_{\mathit{ICE}} \subseteq \mathcal{X} \]
\[ V_i = \{ v_i^{\mathit{min}} , \ldots , v_i^{\mathit{max}} \} \]
\[ f \left( x_{\setminus i} , x_i=v_i \right) \;\; \forall \; x \in X_{\mathit{ICE}} \; \forall \; v_i \in V_i \]
\[ f \left( x_{\setminus i} , x_i=V_i \right) \;\; \forall \; x \in X_{\mathit{ICE}} \]
Original notation (Goldstein et al. 2015)
\[ \left\{ \left( x_{S}^{(i)} , x_{C}^{(i)} \right) \right\}_{i=1}^N \]
\[ \hat{f}_S^{(i)} = \hat{f} \left( x_{S}^{(i)} , x_{C}^{(i)} \right) \]
Centres ICE curves by anchoring them at a fixed point, usually the lower end of the explained feature range.
\[ f \left( x_{\setminus i} , x_i=V_i \right) - f \left( x_{\setminus i} , x_i=v_i^{\mathit{min}} \right) \;\; \forall \; x \in X_{\mathit{ICE}} \]
or
\[ \hat{f} \left( x_{S}^{(i)} , x_{C}^{(i)} \right) - \hat{f} \left( x^{\star} , x_{C}^{(i)} \right) \]
Visualises interaction effects between the explained and remaining features by calculating the partial derivative of the explained model \(f\) with respect to the explained feature \(x_i\).
- When no interactions are present, all curves overlap.
- When interactions exist, the lines will be heterogeneous.
\[ f \left( x_{\setminus i} , x_i \right) = g \left( x_i \right) + h \left( x_{\setminus i} \right) \;\; \text{so that} \;\; \frac{\partial f(x)}{\partial x_i} = g^\prime(x_i) \]
or
\[ \hat{f} \left( x_{S} , x_{C} \right) = g \left( x_{S} \right) + h \left( x_{C} \right) \;\; \text{so that} \;\; \frac{\partial \hat{f}(x)}{\partial x_{S}} = g^\prime(x_{S}) \]
Under certain (quite restrictive) assumptions, ICE is admissible to a causal interpretation (Zhao and Hastie 2021).
See Causal Interpretation of Partial Dependence (PD) for more detail.
Model-focused (global) “version” of Individual Conditional Expectation, which is calculated by averaging ICE across a collection of data points (Friedman 2001). It communicates the average influence of a specific feature value on the model’s prediction by fixing the value of this feature across a designated set of instances.
It communicates the influence of a specific feature value – or similar values, i.e., an interval around the selected value – on the model’s prediction by only considering relevant instances found in the designated data set. It is calculated as the average prediction of these instances.
It communicates the influence of a specific feature value on the model’s prediction by quantifying the average (accumulated) difference between the predictions at the boundaries of a (small) fixed interval around the selected feature value (Apley and Zhu 2020). It is calculated by replacing the value of the explained feature with the interval boundaries for instances found in the designated data set whose value of this feature is within the specified range.
Python | R |
---|---|
scikit-learn (>=0.24.0 ) |
iml |
PyCEbox | ICEbox |
alibi | pdp |
DALEX |