On Interpreting ML Models

This article deals with “What-if” analysis that allows experimentation with inputs to understand model behavior, transcending technical details. Detaching interpretation from model building leads to effective visualizations and overcomes the interpretability-accuracy trade-off.

Rahul S


Photo by Gennady Zakharin on Unsplash

As ML technology has advanced, model interpretability has emerged as a significant challenge. Many practitioners believe highly complex black-box or deep learning (DL) models are inherently uninterpretable. This perceived dilemma has created a divide within the community, forcing users to choose between models that are interpretable but less accurate or models that are accurate but lack transparency.

Traditionally, statistical models have focused on interpreting models by building narratives around the model’s coefficients. In linear regression, for example, interpreting the model involves contextualizing the beta coefficients, while logistic regression relies on the “odds ratio” to construct a narrative.

One should read this to go a little deeper:

With this approach, interpreting a deep learning model comprising hundreds of layers and millions of coefficients is almost impossible. So we need to redefine what it means to interpret a model.

One practical and effective solution is to engage in “what-if” analysis. This type of interpretation involves experimenting with the model by asking questions like “What happens if we change the input in a specific way?”

Since ML models provide predictions based on input combinations, we can design different input scenarios and observe the corresponding outputs. With linear models, linear changes in the input result in linear changes in the output. For DL models, linear changes in the input unveil the nonlinear changes in the output.