icon-carat-right menu search cmu-wordmark

SCALe

Software
SCALe is a static analysis aggregation framework that has been developed mostly as a research prototype tool as part of the SEI’s research projects.
Publisher

GitHub

Abstract

SCALe is a static analysis aggregation framework that has been developed mostly as a research prototype tool as part of the SEI’s research projects. As part of research on modular static analysis classification and prioritization for optional use in a continuous integration (CI) system, many features have been added. The branch “scaife-scale” has the most recent features added from this set of research projects, although the “main” branch contains a subset of the earlier-added features from the classification research projects. These features include 

  1. Visual “fusion” of different static analysis alerts for the same code location (line and file) including by different tools, as long as they map to the same code flaw taxonomy ID (also known as a “condition,” for example a weakness such as CWE-190 or a code rule violation such as INT31-C). This visual fusion is optional, with a click of a button on the GUI. It enables faster adjudication, requiring only one judgment of true or false positive for the fused “meta-alert.”
  2. Formally-defined APIs using in OpenAPI version 3 format, in the form of .yaml and .json files. This enables much faster development with the SCALe code, using a wide variety of coding languages, since automated code generators can be used to create server and client stubs for the formally defined APIs, in a wide variety of common programming languages.
  3. Manual adjudications of true positives and false positives for static analysis results can automatically “cascade” to static analysis results for a later version of the code. This can be done automatically with a CI system using provided scripts and instructions, or alternatively, users can choose to do cascading using the SCALe GUI. Instructions for both types of automated cascading are provided in the SCALe manual (in HTML and markdown file format) that is included with the release. The included cascading feature (and script) uses the POSIX “diff” tool to match lines of code (which may be on different line numbers!) within same-named files, then checks for matched meta-alerts for the same code flaw (e.g., CWE-190) on the matched line. If the line and code flaw match and a manual adjudication exists, then the manual adjudication gets cascaded to the new static analysis alert on the later codebase. Cascading is enabled modularly, since additional research we worked on examined results of using different methods of adjudication cascading.
  4. Prioritization of static analysis results can be done in many ways. One method involves creating and using formulas where the user assigns weights to various features they select and uses multiplication, division, addition, subtraction, and parentheses to create their formula. Filtering can be done to limit the view of the human adjudicator, for instance, so they only see alerts related to the CWE taxonomy or only manually adjudicate those automatically classified with low confidence. 
  5. A new feature enables optional random ordering of filtered results.
  6. Many unit and integration tests have been developed and published with the git repository, to support others as they develop the SCALe codebase using CI methods.
  7. We developed Dockerfiles and SCALe can be quickly and simply started on a wide variety of Linux and Mac OS machines using the docker-compose command as specified in the SCALe manual (HTML and markdown).
  8. SCALe now uses a new set of primary and secondary adjudication labels and has a field for adjudicator notes. We developed a set of static analysis auditing rules and adjudication labels aimed at consistent correct adjudication, as documented separately in our papers and tutorials on that. SCALe’s modified adjudication labels and fields are designed for that.
  9. We developed and published scripts in the SCALe repository that enable the automated creation of release zip files/tarballs, including automatically marking the files with copyright markings and version IDs that other developers can update by editing a couple of files. These scripts are documented in the SCALe manual (HTML and markdown).
  10. We modified SCALe so users can upload their own fields, that become associated with the static analysis results. These uploads are for advanced users who can work with SQLite databases and generate values. Uploaded fields can be used in mathematical priority schemes.
  11. We added a data sanitizer to SCALe that anonymizes sensitive fields, using an SHA-256 hash with salt, which enables analysis of static analysis features correlated with classification confidence or any other feature, without disclosing sensitive code, notes, filenames, project names, class names, paths, etc.
  12. We modified SCALe so it now can more simply and modularly incorporate handling results from additional static analysis tools. In addition to now handling SARIF format (the standardized format for static analysis output), it handles the DHS SWAMP tool format, plus unique formats of many different static analysis tools.
  13. SCALe can be started up using “experiment” startup files (.JSON format) to automatically set up SCALe projects and GUI views (fusion, filtering, priority ordering, etc.) as described in the SCALe manual.
  14. The entirety of SCALe has been designed to optionally be used as one module within a modular static analysis results classification system (optionally CI-integrated) called SCAIFE. More information about that has been published elsewhere by the SEI.