Blog Posts
Beyond Capable: Accuracy, Calibration, and Robustness in Large Language Models
For any organization seeking to responsibly harness the potential of large language models, we present a holistic approach to LLM evaluation that goes beyond accuracy.
Read More•By Matthew Walsh, David Schulker, Shing-hon Lau
In Artificial Intelligence Engineering
GenAI for Code Review of C++ and Java
Would ChatGPT-3.5 and ChatGPT-4o correctly identify errors in noncompliant code and correctly recognize compliant code as error-free?
Read More•By David Schulker
In Artificial Intelligence Engineering
Introduction to MLOps: Bridging Machine Learning and Operations
Machine learning operations (MLOps) has emerged as a critical discipline in artificial intelligence and data science. This post introduces MLOps and its applications.
Read More•By Daniel DeCapria
In Artificial Intelligence Engineering
Measuring AI Accuracy with the AI Robustness (AIR) Tool
Understanding your artificial intelligence (AI) system’s predictions can be challenging. In this post, SEI researchers discuss a new tool to help improve AI classifier performance.
Read More•By Michael D. Konrad, Nicholas Testa, Linda Parker Gates, Crisanne Nolan, David James Shepard, Julie B. Cohen, Andrew O. Mellinger, Suzanne Miller, Melissa Ludwick
In Artificial Intelligence Engineering
Generative AI and Software Engineering Education
Educators have had to adapt to rapid developments in generative AI to provide a realistic perspective to their students. In this post, experts discuss generative AI and software engineering education.
Read More•By Ipek Ozkaya, Douglas Schmidt (Vanderbilt University)
In Artificial Intelligence Engineering
3 Recommendations for Machine Unlearning Evaluation Challenges
Machine unlearning (MU) aims to develop methods to remove data points efficiently and effectively from a model without the need for extensive retraining. This post details our work to address …
Read More•By Keltin Grimes, Collin Abidi, Cole Frank, Shannon Gallagher
In Artificial Intelligence Engineering
Weaknesses and Vulnerabilities in Modern AI: AI Risk, Cyber Risk, and Planning for Test and Evaluation
Modern AI systems pose consequential, poorly understood risks. This blog post explores strategies for framing test and evaluation practices based on a holistic approach to AI risk.
Read More•By Bill Scherlis
In Artificial Intelligence Engineering
Weaknesses and Vulnerabilities in Modern AI: Integrity, Confidentiality, and Governance
In the rush to develop AI, it is easy to overlook factors that increase risk. This post explores AI risk through the lens of confidentiality, governance, and integrity.
Read More•By Bill Scherlis
In Artificial Intelligence Engineering
Weaknesses and Vulnerabilities in Modern AI: Why Security and Safety Are so Challenging
This post explores concepts of security and safety for neural-network-based AI, including ML and generative AI, as well as AI-specific challenges in developing safe and secure systems.
Read More•By Bill Scherlis
In Artificial Intelligence Engineering
Auditing Bias in Large Language Models
This post discusses recent research that uses a role-playing scenario to audit ChatGPT, an approach that opens new possibilities for revealing unwanted biases.
Read More