New Tool Detects Mismatches in Machine Learning System Development
• Article
February 14, 2023—The Software Engineering Institute recently released a new tool to help software developers better integrate machine learning (ML) components into software systems. Developers can use the TEC ML Mismatch Detection Tool, available on the SEI’s GitHub site, to identify disruptive disagreements among stakeholders’ assumptions about ML components.
With TEC, all the stakeholders in the development of an ML-enabled system use a formal set of descriptors to input their expectations for ML components, such as required computing resources or runtime metrics. The tool compares each stakeholder’s expectations for every descriptor and flags any mismatches or missing information. The stakeholders can then resolve problematic differences early in the development lifecycle, meaning less rework and better performing ML systems.
ML is part of a wide variety of software systems, from mobile banking applications to satellite image recognition. While ML has existed for decades, the best practices and processes for merging ML models into larger systems have yet to coalesce. Software developers often struggle to move ML components into production systems, and many ML-enabled systems do not operate as expected.
The development of ML systems typically involves three separate, often siloed teams, with their own workflows and perspectives: data scientists build the ML model, software engineers incorporate the model into the software system, and operations staff move the model into production. These teams frequently fail to communicate with each other about the ML aspects of the project. The result is ML mismatch, a problem that occurs in the development, deployment, and operation of an ML-enabled system due to incorrect assumptions made about system elements by different stakeholders.
The problem is especially acute in the Department of Defense. “The DoD acquires machine learning models from contractors, so they’re dealing not just with separate teams, but separate organizations,” said Grace Lewis, SEI principal researcher, Tactical and AI-Enabled Systems (TAS) Initiative lead, and lead of the SEI’s ML mismatch research. “As the DoD adopts machine learning to solve mission-critical problems, the inability to detect and avoid machine learning mismatch creates delays, rework, and failure in the development, deployment, and evolution of these systems.”
Project delay and rework are hallmarks of ML mismatch, as in one real-world example cited by Lewis. “The data science team built a great model,” she said. “The production team thought it would be plug and play. But when they got the model, they realized they’d have to write a huge amount of code just to use its results.”
ML mismatch can lead to a host of other problems too: poor ML component performance, complicated connection between system components, insufficient computing resources, and the inability of the system to even detect these issues.
Lewis and her colleagues set out in 2019 to begin formalizing this murky space. Their survey of ML system developers, published in 2021 as the study “Characterizing and Detecting Mismatch in Machine-Learning-Enabled Systems,” revealed the information that stakeholders wish they had received from each other. Lewis and her team refined this information into a set of eight formal, machine-readable descriptors that define system attributes to avoid mismatch: system context, raw data, training data, data pipeline, trained model, development environment, production environment, and production data.
The descriptors form the heart of the TEC ML Mismatch Detection tool, released in January. Using TEC’s interface, stakeholders in an ML project enter descriptors for their parts of the system. All stakeholder descriptors are visible to other users in TEC, which can also import and export descriptors for easy sharing with other organizations. Like the detectives for which it was named (deTECtive), TEC compares the stakeholders’ entered descriptors and looks for discrepancies. It uses a set of rules to examine the information in the descriptors and generate alerts on missing or mismatching information.
Catching these mismatches early in the development lifecycle saves developers time and money by preventing rework. By making the attributes and structure of the system explicit, TEC also promotes component reuse. “TEC documents the descriptors each stakeholder is bringing forward. Those can be reused and then built on,” said Rachel Brower-Sinning, an SEI ML research scientist who researches ML mismatch. Supporting documentation for each component can also be uploaded to TEC, promoting reuse across projects.
The SEI team is now using the ML mismatch descriptors to build tools that test ML components for production readiness. Testing ML components is a known challenge, especially for organizations that acquire ML components such as the DoD, and a top source of ML mismatch, according to the 2021 SEI study.
Lewis and her team invite anyone to download TEC, use it for an ML systems development project, and provide feedback. “Is the data TEC asks for easy to get?” suggested Alex Derr, an SEI associate software engineer and the TEC tool’s principal developer. “Is the data you're getting from other groups useful to your part of the ML system development? We’d also like to know if the mismatch alerts are useful or if there are things we could add to the tool.”
Users are also free to modify the open source TEC software and share those modifications with the SEI. Perhaps more important is for the community to validate the ML mismatch descriptors, identify information that is missing from them, and use them as a foundation for new tools.
Reducing a project’s ML mismatch is important, but TEC also encourages a more useful way of thinking about ML system development, according to Lewis. “The ML system development process tends to be very model centric, and we need move to a system-centric view,” she said. “TEC helps developers think about the model as part of a larger system and bring all the stakeholders together.”
Visit the SEI’s website to learn more about our ML mismatch research. Download the TEC ML Mismatch Detection Tool and the ML mismatch descriptors from the SEI’s GitHub site. Read the paper “Characterizing and Detecting Mismatch in Machine-Learning-Enabled Systems,” which describes the study behind the descriptors. Watch Grace Lewis’s SEI Research Review 2022 presentation on the SEI’s YouTube channel, and download her slides from the SEI Digital Library. To contact the SEI about TEC, the ML mismatch descriptors, or collaborations, email info@sei.cmu.edu.