search menu icon-carat-right cmu-wordmark

Predicting Inference Degradation in Production ML Systems

Proposes developing empirically validated metrics and a test harness to predict a model's inference quality degradation due to different types of data drift.

Software Engineering Institute

Topic or Tag



Machine Learning production systems frequently undergo the effects of inference degradation, which is the loss of predictive quality of ML models over time, due to differences between characteristics of training and production data. Current industry engineering practices evade inference degradation via periodic retraining and model redeployment strategies, rather than monitoring. These strategies become less feasible as DoD AI systems move into the operational AI space, characterized by fast tempo and resource constraints. Unfortunately, timely, reliable identification of Inference degradation is difficult. The proposed solution is to develop a set of empirically validated metrics that predict when a model’s inference quality will degrade due to different types of data drift. The process will include developing a test harness that will help model developers define drift detection metrics and thresholds for their models. These can be integrated into MLOps pipelines and monitoring infrastructures, therefore detecting inference degradation in production ML systems. It’s anticipated that users of the test harness and drift detection metrics will have better information to detect inference degradation in deployed models and therefore prevent misinformed decisions, costly reengineering, and potential system decommission.