Measuring Beyond Accuracy
• Conference Paper
Publisher
Software Engineering Institute
Abstract
Most machine learning (ML) projects focus on “accuracy” for model evaluation. While accuracy is useful for knowing how well a model performs on a test dataset at the time of model development, there are other significant implications in assessing the utility and usability of a machine learning model. Key considerations include robustness, resilience, calibration, confidence, alignment with evolving user requirements, and fit for mission and stakeholder needs as part of an integrated system, among others. In this paper, we explore what it means to measure beyond accuracy and define critical considerations for the test and evaluation of machine learning and, more broadly, artificial intelligence (AI) systems. After defining key measurement considerations, the AI engineering community will be better equipped to develop and implement comprehensive and applied methods for the evaluation of models as well as possible metrics for more realistic and real-world model evaluation.
Part of a Collection
AI Engineering Assets
Proceedings of the AAAI Spring Symposium on AI Engineering, 2022