Software Engineering for Machine Learning
Software Engineering Institute
A problem with deployment of machine learning (ML) systems in production environments is that their development and operation involve three perspectives, with three different and often completely separate workflows and people: the data scientist builds the model; the software engineer integrates the model into a larger system; and then operations staff deploy, operate, and monitor the system. Because these perspectives operate separately and often speak different languages, there are opportunities for mismatch between the assumptions made by each perspective with respect to the elements of the ML-enabled system and the actual guarantees provided by each element.
We conducted a study with practitioners to identify mismatches and their consequences. In parallel we conducted a multi-vocal literature study to identify best practices for software engineering of ML systems that could address the identified mismatches. The result is a set of machine-readable descriptors that codify attributes of system elements and therefore make all assumptions explicit. The descriptors can be used by system stakeholders in a manual way, for information awareness and evaluation activities, and by automated mismatch detectors at design time and runtime, for cases in which attributes lend themselves to automation.
This study showed that many mismatch examples were due to lack of understanding of how to monitor ML systems to detect problems with the quality of inferences made by deployed models. In this talk, we also introduce a new project that will develop novel metrics that predict when a model’s inference quality will degrade below a threshold. The expected benefits of the metrics are that they will be able to determine (1) when a model really needs to be retrained so as to avoid spending resources on unnecessary retraining and (2) when a model needs to be retrained before its scheduled retraining time so as to minimize the time that the model is producing suboptimal results. The metrics will be validated in the context of models using convolutional neural networks (CNNs), which are state of the art and ubiquitous for computer vision and relevant to Department of Defense (DoD) systems such as surveillance, autonomous vehicles, landmine removal, manufacturing quality control, facial recognition, captured enemy material (CEM) analysis, and disaster response.
What attendees will learn:
- Perspectives involved in the development and operation of ML systems
- Types of mismatch that occur in the development of ML systems
- Future work in software engineering for ML systems
About the Speaker
Grace Lewis is a Principal Researcher and the lead for the Tactical and AI-Enabled Systems (TAS) Initiative at the Carnegie Mellon Software Engineering Institute (SEI). She is a Principal Investigator for two projects in the growing field of software engineering for machine-learning (ML) systems: “Characterizing and Detecting Mismatch in ML-Enabled …Read more
Ipek Ozkaya is a principal researcher and the technical director of the Engineering Intelligent Software Systems group at the SEI. Ozkaya’s primary interests include developing techniques for improving software development efficiency and system evolution with an emphasis on software architecture practices, software economics, and agile development. Ozkaya’s most recent research …Read more