Updated Machine Learning Test and Evaluation Tool and Process Adds Major Features

• Article
May 28, 2025—Machine learning (ML) models destined for integration into a larger software system are frequently developed in isolation, making it impossible to test and evaluate them against system and operational requirements. This limitation can lead to failures in production, putting at risk warfighters who depend on ML-enabled capabilities, such as situational awareness and threat recognition enabled by computer vision models. The Software Engineering Institute (SEI) created the Machine Learning Test and Evaluation (MLTE) process and tool to help ensure ML models are production ready. Earlier this month, the SEI released version 2.0.0 of MLTE, which adds new features.
Most ML model developers know little about the overarching system or its operational environment. Software engineers and quality assurance teams often lack ML model specifications. These silos prevent system stakeholders from knowing how well the ML model will work in production. MLTE version 1.0, released in October 2024, applied software engineering best practices to bridge these silos and enable informed test and evaluation (T&E) of ML models. MLTE is a system-centric, quality-attribute-driven, semi-automated process and tool to enable negotiation, specification, and testing of ML model and system qualities. It can import information from TEC, an earlier SEI tool that detects mismatched expectations among the teams building ML-enabled systems. Both TEC and MLTE are part of an SEI effort to establish integrated T&E of ML capabilities throughout the Department of Defense.
Early adopters and evaluators of MLTE drove the new features and improvements in version 2.0.0:
- better integration of the Negotiation Card into the MLTE process by tracking Quality Attribute Scenarios as requirements and linking them to the actual tests
- improved Test Suite specification, including Test Cases linked to Quality Attribute Scenarios, Validators that can be created independently from Evidence types, and Measurements that can be set directly in the Test Suite and then easily re-executed for all tests
- support for Python 3.12
The new version of MLTE creates stronger links between ML model requirements, derived from the larger system, and requirement satisfaction testing. This kind of T&E, which considers the system and its operational context, leads to models that are production ready. It also prevents the lengthy rework that follows ML models failing during operational tests or in production, so ML capabilities are deployed faster to the warfighter.
Download MLTE 2.0.0 from the project’s GitHub site. Read more about the tool’s background in the papers Using Quality Attribute Scenarios for ML Model Test Case Generation and MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities. Learn more about MLTE from the MLTE fact sheet, SEI Podcast Series, and SEI Blog.