Currently machine learning not only act actively in research but also in production. But when serving to clients, a black-box machine learning model might not works:
In additional to prediction given by model, answering these question could help our end-users understand their dataset, and their problem more clearly. (Machine Learning could performs like a way of EDA)
And we can see from above that machine learning model intepretation is usually restricted into 2 domain:
And we usually cares about:
SHAP stands for SHaply Additive exPlanation. The core is shaply value.
Shaply value provides an additve method to calculate the contribution of each feature to model. Mathematically, $y_i = \sum \phi_{i, j}$ where i is index of sample and j is index of feature
A very simple example would be use A and B to predict y. ($X = \{A, B\} \rightarrow Y$)
Hence the shaply value of this sample, A and B would be weighted sum so 0.25 and 0.45, respectively. Hence, we know feature B has a higher contribution to model (Compare the absolute value) compared to A, in this sample.