Abstract:
Most high-performing machine learning models lack interpretability. However, especially in critical tasks, understanding the model behavior is of fundamental importance.
In the talk, we address the problem of the lack of transparency of classification models. We propose post-hoc techniques to analyze the behavior of classifiers, from the perspective of both a single instance and a subgroup. The proposed model-agnostic techniques build on the notion of pattern, which allows capturing relevant associations of attribute values and identifying data subgroups.
At the instance level, we propose a rule-based method that explains the prediction of any classifier on a specific instance by analyzing the joint effect of feature subsets on the classifier prediction. From the subgroup perspective, we address the problem of identifying and characterizing data subgroups in which a classification model (or a ranking system) behaves differently. The identification of these critical data subgroups plays an important role in many applications, for example, model validation and testing, or evaluation of model fairness and identification of bias.