Release of SCAIFE System Version 1.0.0 Provides Full GUI-Based Static-Analysis Adjudication System with Meta-Alert Classification

The SEI Source Code Analysis Integrated Framework Environment (SCAIFE) is a modular architecture designed to enable a wide variety of tools, systems, and users to use artificial intelligence (AI) classifiers for static-analysis meta-alerts at relatively low cost and effort. SCAIFE uses automation to reduce the significant manual effort required to adjudicate meta-alerts that are produced by static-analysis tools. The architecture also enables

low-effort integration for tools to incorporate mathematical formulas for meta-alert prioritization,
data aggregation in one location to improve classification using a rich labeled dataset, and
a modular capability to use a variety of classifiers and active learning (also known as adaptive heuristics).

We developed the SEI SCAIFE system to instantiate that architecture, as a research prototype. With the March 2020 release of SCAIFE System v 1.0.0, which I describe in this blog post, users for the first time could run the functionality required for static-analysis classification fully from the SCAIFE user-interface module's graphical user interface (GUI). Users can now create a project and classifier, run the classifier on the user's project, and then see the classifier-determined confidence values displayed for the meta-alerts list.

SCAIFE is designed so that a wide variety of static-analysis tools can integrate with the SCAIFE system using the SCAIFE application programming interface (API) definitions, one per SCAIFE module. Each SCAIFE system release includes the current SCAIFE API definitions, code that instantiates SCAIFE API function calls, and a multi-server SCAIFE system and all code that we at the SEI developed for that. We distribute releases as a lightweight code tarball or on a virtual machine (VM), in either case using Docker containers for the servers. The release also includes extensive documentation of the system in an HTML manual.

As part of my research project, we developed a version of the SEI CERT Division's SCALe (Source Code Analysis Laboratory) tool as a GUI front end for the SCAIFE system (as shown at the top of Figure 1 below). Through SCALe, auditors can examine alerts from one or more static-analysis tools and the associated code, make determinations about a meta-alert (e.g., mark it as true or false), and export the project-audit information to a database or file. We released the full SCAIFE v 1.0.0 system to five DoD collaborator teams.

Although we cannot yet publicly publish the full SCAIFE system, we do publish some separable parts. We publish the version of the SCALe tool that is used as a SCAIFE user-interface (UI) module in the five-module SCAIFE system at https://github.com/cmu-sei/SCALe/tree/scaife-scale. That system can be used as a standalone static-analysis aggregator and adjudication tool, and can optionally be used as a pre-developed module in a different, new implementation of the SCAIFE architecture. The five SCAIFE API definitions are distributed with the full SCAIFE system, and we also distributed them publicly at https://github.com/cmu-sei/SCAIFE-API. Figure 1 below shows the modules that comprise the SCAIFE system:

Figure 1: Architecture of the Five-Module SCAIFE System

Figure 2 below shows the flow of data between the SCALe GUI front end and the SCAIFE System v 1.0.0.

SCAIFE New Release dataflow with SCALe module.png

Figure 2: Dataflow in SCAIFE v 1.0.0

What Is New in SCAIFE System v 1.0.0?

The release includes SCAIFE API v 1.0.0 definitions (YAML, JSON, and HTML) for the DataHub, Statistics, Registration, Prioritization, and UI modules. The release contains much new code that instantiates this version of the SCAIFE API. This release also contains new code (and "how-to" information in the manual) that enables users to regenerate the updated API definitions in YAML, JSON, and HTML automatically after they modify the API initial .yaml files.

We encourage collaborators to enhance the API, SCALe code, SCAIFE code, and the manual, both to tailor them to their needs and to send the project useful feature enhancements from which the SEI and all collaborators can benefit. To facilitate such enhancements, this version of the SCAIFE system also includes software and how-to information for API updates, HTML/markdown SCAIFE/SCALe manual updates, and code updates.

SCAIFE version 1.0.0 includes a quick-start demo documented in the SCAIFE HTML manual. The encrypted VM file in which we released SCAIFE v 1.0.0 includes code and tool output for two SCAIFE demonstration projects. This release also contains bug fixes, performance enhancements, and many additional new features. In particular, new and modified content in this version of the API includes

updated test suite and tool data structures
new taxonomy, language, and secondary message models
response messages with more data and precision (e.g., responses provide unique IDs of uploaded data, plus wider variety of error messages)
new methods to edit previously uploaded data

The GitHub publication of the SCAIFE API for SCAIFE System v 1.0.0 (another publication, this one separate from the SCAIFE system release) added a new section, "How to get started with the API," to the README plus an explanation of access-token use in SCAIFE, both of which had been requested by reviewers.

Rationale Underlying the Release of SCAIFE System v 1.0.0

Before releasing SCAIFE v 1.0.0, we had released multiple beta versions of the SCAIFE system. Those beta versions (beta v1 Aug 2019, beta v2 Sept 2019, beta v2.1 Oct 2019) implemented many of the SCAIFE API-defined calls, plus they implemented much of the internal logic required for SCAIFE functionality. We demonstrated these beta versions by means of regression tests we implemented and used for automated testing during our internal continuous-integration testing as we developed SCAIFE. When we released the beta versions of SCAIFE to our collaborators to demonstrate functionality we had implemented so far, we gave them detailed manuals directing how they could run our set of automated tests and how to directly inspect the Mongo databases and SQLite3 databases to verify that they contained the expected data that demonstrated that the functionality developed so far was working as expected.

Although our collaborators were able to run these tests, this type of verification was not very user-friendly. Running the tests required many manual steps from the SCALe GUI in combination with commands from the terminal, each of which had to be done exactly as specified. Our collaborators had varying levels of experience with database inspections, and sometimes making a mistake in following the command-line instructions in exact sequence would throw off their entire set of results. Consequently, our collaborators asked for an all-GUI interface. Our goal was always to develop a research tool that code analysts--people adjudicating static-analysis meta-alerts--could interact with through a GUI. The beta releases did include some GUI features, and feedback from collaborators on early versions of those features helped us to improve the later versions. For example, as a result of feedback, we included more explanatory text. We also rearranged placement of menu items and added a "definitions" menu item.

Given the concerns expressed by our collaborators, SCAIFE version 1.0.0 represents a significant milestone: as of that release, users/testers/collaborators have been able to work fully from the GUI to create a project, specify a classifier, run a classifier, and see the results of classification in the GUI.

This method of releasing often to collaborators and getting their feedback is consistent with modern DevOps and Agile practices:

soliciting public feedback on APIs and SCALe versions that have been publicly released
soliciting collaborator feedback on APIs and full SCAIFE-system versions we have released to them
using the feedback to improve the code and tool or system while developing it
developing regression tests for use in automated regression tests while developing the code
automating release builds. In this area, we have done a lot of work to containerize the SCAIFE system, create automated VM scripts (Vagrantfiles), Dockerfiles, docker-compose files, and scripts to automatically add copyright markings and version numbers, and remove proprietary data files. Although we have not yet completely automated our release builds, this is something we are currently working to finish.

We design our code and APIs so they can be extended by others to tailor the tools for their use. We also invite them to contribute their code enhancements back to us by contacting us. In addition, we use these DevOps methods to help us during this project to maintain functionality in our complex systems.

Looking Ahead to Next Steps for SCAIFE

Since the initial release of v 1.0.0 in late March 2020, we and our collaborators are continuing to drive the evolution of SCAIFE by releasing interim versions with increasing sets of functionality that are getting us closer to our next goal, which is to integrate SCAIFE with a continuous-integration (CI) system. Modifications for CI-SCAIFE integration are shown below in Figure 3:

Figure 3: Modified SCAIFE Architecture for Integration with a CI System

As with the v 1.0.0 release, we are implementing parts of this modified architecture, and our collaborators have been testing and providing feedback as we continue further development.

Figure 4 below shows a vision that integrates classifier use with CI systems:

Figure 4: Integration of Classifier Use with CI Systems

Figure 4 shows CI workflow, where a member of the development and test teams develops code on a new branch (or branches) that implements a new feature or a bug fix. The coder checks their source code into the repository (e.g., git commit and git push). Next, the CI server tests that code, first setting up what is needed to run the tests (e.g., creating files and folders to record logs and test artifacts, downloading images, creating containers, running configuration scripts), then starting the automated tests. In the short CI timeframes, essential tests must be run, including: unit tests that check that small bits of functionality continue to work, integration tests that check that larger parts of the system functionality continue to interact as they should, and sometimes stress tests to ensure that the system performance has not become much worse.

Sometimes (but not always) static analysis is done during the CI-server testing. When this test occurs, it produces output with many alerts. Some meta-alerts may be false positives, and all of them must (normally) be examined manually to adjudicate true or false positives. In very short CI timeframes, however, dealing with static-analysis alerts is of low priority for development and test teams. Any failed unit or integration test must be fixed before the new code branch can be merged with the development branch, so those are of high priority. Beyond that, there are major time pressures from the CI cycle and the other developers or testers who need that bug fix or a new feature added so it does not block their own work or cause a merge conflict in the future.

I am currently leading a research project that is working to develop algorithms and system designs required to enable practical use of static-analysis classifiers during short CI cycles. The project aims to enable catching and fixing some code flaws identified by those tools early in the software-development lifecycle, and thus far more cheaply than later in the lifecycle. We are developing a CI-integrated version of SCAIFE to prototype it, which will be the focus of future blog posts.

Software Engineering Institute

SEI Blog