
Blog Posts
Protecting AI from the Outside In: The Case for Coordinated Vulnerability Disclosure

This post highlights lessons learned from applying the coordinated vulnerability disclosure (CVD) process to reported vulnerabilities in AI and ML systems.
Read More•By Allen D. Householder, Vijay S. Sarvepalli, Jeff Havrilla, Matt Churilla, Lena Pons, Shing-hon Lau, Nathan M. VanHoudnos, Andrew Kompanek, Lauren McIlvenny
In Artificial Intelligence Engineering


Introducing MLTE: A Systems Approach to Machine Learning Test and Evaluation
Machine learning systems are notoriously difficult to test. This post introduces Machine Learning Test and Evaluation (MLTE), a new process and tool to mitigate this problem and create safer, more …
Read More•By Alex Derr, Sebastián Echeverría, Katherine R. Maffey (AI Integration Center, U.S. Army), Grace Lewis
In Artificial Intelligence Engineering


The Myth of Machine Learning Non-Reproducibility and Randomness for Acquisitions and Testing, Evaluation, Verification, and Validation
A reproducibility challenge faces machine learning (ML) systems today. This post explores configurations that increase reproducibility and provides recommendations for these challenges.
Read More•By Andrew O. Mellinger, Daniel Justice, Marissa Connor, Shannon Gallagher, Tyler Brooks
In Artificial Intelligence Engineering


Beyond Capable: Accuracy, Calibration, and Robustness in Large Language Models
For any organization seeking to responsibly harness the potential of large language models, we present a holistic approach to LLM evaluation that goes beyond accuracy.
Read More•By Matthew Walsh, David Schulker, Shing-hon Lau
In Artificial Intelligence Engineering


GenAI for Code Review of C++ and Java
Would ChatGPT-3.5 and ChatGPT-4o correctly identify errors in noncompliant code and correctly recognize compliant code as error-free?
Read More•By David Schulker
In Artificial Intelligence Engineering

Introduction to MLOps: Bridging Machine Learning and Operations
Machine learning operations (MLOps) has emerged as a critical discipline in artificial intelligence and data science. This post introduces MLOps and its applications.
Read More•By Daniel DeCapria
In Artificial Intelligence Engineering

Measuring AI Accuracy with the AI Robustness (AIR) Tool
Understanding your artificial intelligence (AI) system’s predictions can be challenging. In this post, SEI researchers discuss a new tool to help improve AI classifier performance.
Read More•By Michael D. Konrad, Nicholas Testa, Linda Parker Gates, Crisanne Nolan, David James Shepard, Julie B. Cohen, Andrew O. Mellinger, Suzanne Miller, Melissa Ludwick
In Artificial Intelligence Engineering


Generative AI and Software Engineering Education
Educators have had to adapt to rapid developments in generative AI to provide a realistic perspective to their students. In this post, experts discuss generative AI and software engineering education.
Read More•By Ipek Ozkaya, Douglas Schmidt (William & Mary)
In Artificial Intelligence Engineering


3 Recommendations for Machine Unlearning Evaluation Challenges

Machine unlearning (MU) aims to develop methods to remove data points efficiently and effectively from a model without the need for extensive retraining. This post details our work to address …
Read More•By Keltin Grimes, Collin Abidi, Cole Frank, Shannon Gallagher
In Artificial Intelligence Engineering


Weaknesses and Vulnerabilities in Modern AI: AI Risk, Cyber Risk, and Planning for Test and Evaluation
Modern AI systems pose consequential, poorly understood risks. This blog post explores strategies for framing test and evaluation practices based on a holistic approach to AI risk.
Read More•By Bill Scherlis
In Artificial Intelligence Engineering
