
Blog Posts
Out of Distribution Detection: Knowing When AI Doesn’t Know
How do we know when an AI system is operating outside its intended knowledge boundaries?
Read More•By Eric Heim, Cole Frank
In Artificial Intelligence Engineering


10 Things Organizations Should Know About AI Workforce Development
This post outlines 10 recommendations developed in response to work with our mission partners in the Department of Defense.
Read More•By Jonathan Frederick, Dominic A. Ross, Eric Keylor, Cole Frank, Intae Nam
In Artificial Intelligence Engineering


DataOps: Towards More Reliable Machine Learning Systems
Decisions based on ML models can have significant consequences, and managing the raw material—data—in ML systems is a challenge. This post explains DataOps, an area that focuses on the management …
Read More•By Daniel DeCapria
In Artificial Intelligence Engineering

Evaluating LLMs for Text Summarization: An Introduction
Deploying LLMs without human supervision and evaluation can lead to significant errors. This post outlines the fundamentals of LLM evaluation for text summarization in high-stakes applications.
Read More•By Shannon Gallagher, Swati Rallapalli, Tyler Brooks
In Artificial Intelligence Engineering


The Essential Role of AISIRT in Flaw and Vulnerability Management
The SEI established the first Artificial Intelligence Security Incident Response Team (AISIRT) in 2023. This post discusses the role of AISIRT in coordinating flaws and vulnerabilities in AI systems.
Read More•By Lauren McIlvenny, Vijay S. Sarvepalli
In Artificial Intelligence Engineering


Enhancing Machine Learning Assurance with Portend
This post introduces Portend, a new open source toolset that simulates data drift in machine learning models and identifies the proper metrics to detect drift in production environments.
Read More•By Jeffrey Hansen, Sebastián Echeverría, Lena Pons, Gabriel Moreno, Grace Lewis, Lihan Zhan
In Artificial Intelligence Engineering


Introducing MLTE: A Systems Approach to Machine Learning Test and Evaluation
Machine learning systems are notoriously difficult to test. This post introduces Machine Learning Test and Evaluation (MLTE), a new process and tool to mitigate this problem and create safer, more …
Read More•By Alex Derr, Sebastián Echeverría, Katherine R. Maffey (AI Integration Center, U.S. Army), Grace Lewis
In Artificial Intelligence Engineering


The Myth of Machine Learning Non-Reproducibility and Randomness for Acquisitions and Testing, Evaluation, Verification, and Validation
A reproducibility challenge faces machine learning (ML) systems today. This post explores configurations that increase reproducibility and provides recommendations for these challenges.
Read More•By Andrew O. Mellinger, Daniel Justice, Marissa Connor, Shannon Gallagher, Tyler Brooks
In Artificial Intelligence Engineering


Beyond Capable: Accuracy, Calibration, and Robustness in Large Language Models
For any organization seeking to responsibly harness the potential of large language models, we present a holistic approach to LLM evaluation that goes beyond accuracy.
Read More•By Matthew Walsh, David Schulker, Shing-hon Lau
In Artificial Intelligence Engineering


GenAI for Code Review of C++ and Java
Would ChatGPT-3.5 and ChatGPT-4o correctly identify errors in noncompliant code and correctly recognize compliant code as error-free?
Read More•By David Schulker
In Artificial Intelligence Engineering
