Using Triangulation to Evaluate Machine Learning Models

January 8, 2019 • Presentation

By

Andrew Fast, PhD (CounterFlow AI)

In this presentation, Dr. Andrew Fast presents a series of questions and data queries that can be used to determine the parameters of effectiveness for a machine learning model.

Publisher

CounterFlow AI, Inc.

Topic or Tag

Flocon

Abstract

There are few industries using machine learning models with more at stake than network security. Having a high performing statistical model is critical: a false positive error leads to unnecessary work for the network security team while a false negative error increases exposure to malware, threat actors and/or other types of threats. Since there are no perfect machine learning models, as data scientists our task is to first convince ourselves and then convince others that we have a statistical model worthy for defending the network. Persuasion, though, can be difficult because many of the steps and assumptions that go into training a statistical model from data are difficult, if not impossible, to accurately share with the ultimate consumers of the model. As machine learning and other advanced statistical techniques become more wide spread within the network analysis community, the need for accurate assessment of models for threat detection is also increasing.

Drawing on ideas from the philosophy of science such as falsifiability and counterfactuals, we present a framework for triangulating the performance of machine learning models using a series of questions to help establish the validity of performance claims. In navigation tasks, triangulation can be used to determine one’s current location based on the angle and distance from other landmarks with known position. We believe triangulation of a different sort is necessary to determine the performance of machine learning models. Each of the steps that go into making a machine learning model including input data selection, sampling, outcome variable selection, feature creation, model selection and evaluation criteria shape the final model and provide necessary context for interpreting the performance results. Our framework highlights ways to uncover assumptions hidden in those choices, identify higher performing models, and ultimately better defend our networks.

Software Engineering Institute