Posted on by SCALe: a Static Analysis Alert Auditing Toolin
by Lori Flynn
Software Security Researcher
CERT Secure Coding Group
This post was co-authored by Ebonie McNeil.
Static analysis tools analyze code without executing it, to identify potential flaws in source code. These tools produce a large number of alerts with high false-positive rates that an engineer must painstakingly examine to find legitimate flaws. As described in Lori's first blog post on this topic, we in the SEI's CERT Division have developed the SCALe (Source Code Analysis Laboratory) tool since 2010 as part of our research on new ways to help analysts be more efficient and effective at auditing static analysis alerts.
We develop the SCALe tool with new features to enable research, for instance to gather metrics on the effect of the new features or to enable SCALe audit archives to have higher quality data and develop better automated alert classifiers. In August 2018 we released a version of SCALe (version 2) to the public (open source via Github). This blog post describes new features and capabilities we have added in our ongoing development of SCALe.
Lori Flynn is the principal investigator leading research on static analysis alert classification and advanced alert prioritization from late 2015 through the present, and her research project teams have done almost all the SCALe development during that time. Co-author Ebonie McNeil has been involved with this research since December 2017, and has made significant contributions to the development of SCALe. SCALe developers on Lori's research project teams since late 2015 also include Aubrie Woods, David Svoboda, William Snavely, Derek Leung, Jiyeon Lee, Lucas Bengston, Jennifer Burns, Christine Baek, Baptiste Vauthy, Shirley Zhou, Maria Rodriguez De La Cruz, and Elliot Toy.
The latest release of the SCALe tool is version 3.0, which is currently a non-public research prototype that we share only with collaborators. In this release, we have added many features and fields to provide automated classification and advanced prioritization of static analysis alerts. We describe these enhancements in this post. We also describe plans to connect this enhanced version of SCALe to an architecture that will provide classification and prioritization of alerts via application programming interface (API) calls. The research project team is developing a prototype that instantiates the architecture, and we are working to complete the prototype and integrate it with latest version of SCALe.
SCALe provides a graphical user interface (GUI) front end for auditors to examine alerts from one or more static analysis tools and the associated code, to make determinations about an alert (e.g., mark it as true or false), and export the project audit information to a database or file. The publicly released version of SCALe can be used for auditing software in four languages (C, C++, Java, and Perl) and two code-flaw taxonomies [SEI CERT coding rules and MITRE's Common Weakness Enumeration (CWE)]. To learn more about SCALe, see SCALe: A Tool for Managing Output from Static Analysis Tools.
The two figures below illustrate the evolution of the SCALe tool from version 1.0 (prior to the research project enhancements begun in late 2015) to version 3.0, which is the current working version.
Figure 1: SCALe v1 Exported Database Format
The following modifications to the initial version of the SCALe databases include new features important for prioritization and classification integration:
The database design in Figure 2 shows the new format of an exported SCALe project (in sqlite3 database format).
Figure 2: SCALe v 3.0 Exported Database Format
The following sections briefly describe new functionalities added in v 3.0 of SCALe that weren't described in our previous blogpost.
Selection of a Priority Scheme
Prioritization-scheme selection allows users to prioritize static analysis alerts using factors they care about, as shown in the figure below. The formulas can combine classifier confidence and other values (e.g., risk, cost) used by the system.
Figure 3: Save Prioritization Scheme
Ability to Upload Additional Fields
Uploaded fields can currently be used in prioritization formulas, but cannot be viewed in the GUI. Future versions of SCALe will include the fields in the GUI.
The upload-fields option is intended for advanced users who can work with SQL databases and who have or can generate values for new fields (e.g., based on advanced logic using the other alert fields or based on proprietary data). Nn extended and more user-friendly version of the concept, however, could be added without technical difficulty beyond standard development. This addition would enable less technical users to generate values for new fields using mathematical formulas, advanced logic, and data beyond that provided in the initial SCALe database.
Ability to Select and Run a Classification Scheme
Although classification is not available in the current version of SCALe, we have integrated many features required for classification into it. Interfaces are provided for user testing and feedback.
Figure 4: Selection of a Classifier Scheme
After a classifier type is selected, a popup window appears with the following options:
After selecting a classification scheme, the user will be able to click a "Classify" button that will cause meta-alerts to be classified (e.g., level of confidence true or false will be predicted).
This feature allows a user to take determinations made from a previous SCALe audit and automatically apply them to alerts generated by a new SCALe audit. The section Upload Determinations from Project in the Edit Project screen (shown in Figure 5, with red arrows pointing to where users must make selections) provides this functionality. It uses the UNIX diff utility to determine if a code line within a file from a previous version of a code base matches a code line within the current version of the code base. If the lines match and there was a previously made determination for the meta-alert, then it infers the same determination for the current meta-alert. A new note with a timestamp is added that identifies the determination as being cascaded.
Future work will analyze the accuracy of classifiers created using cascaded determinations compared to classifiers using only regular determinations.
Figure 5: Upload Determinations Screen
Next Steps and Future Work
The research project that developed SCALe v3 is developing an architecture with APIs and an open-source prototype system to enable users to
In the future, SCALe will interact with the other parts of the architecture and three other servers will provide external functionality. Currently, the enhancements to SCALe provide advanced prioritization and much of the classifier functionality that will be required in SCALe for a fully integrated system. Our research project team will soon publish the architecture model and the beta version of our API for the architecture. After that, we will publish a third blog post in this series.
We hope that potential collaborators will read about what we are doing with SCALe and contact us if they want to collaborate. (Collaborators are invited to implement the API from their alert auditing tool and provide feedback, to test SCALe and provide feedback, and/or to send us sanitized alert audit data with manual auditor determinations.) Other interested users should watch the SEI website for announcements about the planned public release of SCALe v 3.
Read the blog posts in this series of posts on SCALe.
Read SEI press release, SEI CERT Division Releases Downloadable Source Code Analysis Tool.
Read the SEI blog post, Static Analysis Alert Test Suites as a Source of Training Data for Alert Classifiers.
Read the Software QUAlities and their Dependencies (SQUADE, ICSE 2018 workshop) paper Prioritizing Alerts from Multiple Static Analysis Tools, Using Classification Models.
Read the SEI blog post, Prioritizing Security Alerts: A DoD Case Study (In addition to discussing other new SCALe features, it details how the audit archive sanitizer works.)
Read the SEI blog post, Prioritizing Alerts from Static Analysis to Find and Fix Code Flaws.
View the presentation Challenges and Progress: Automating Static Analysis Alert Handling with Machine Learning.
View the presentation (PowerPoint): Hands-On Tutorial: Auditing Static Analysis Alerts Using a Lexicon and Rules.
Watch the video: SEI Cyber Minute: Code Flaw Alert Classification.
View the presentation: Rapid Expansion of Classification Models to Prioritize Static Analysis Alerts for C.
View the presentation: Prioritizing Alerts from Static Analysis with Classification Models.
Read the SEI paper, Static Analysis Alert Audits: Lexicon & Rules, presented at the IEEE Cybersecurity Development Conference (IEEE SecDev), which took place in Boston, MA on November 3-4, 2016.
Read the SEI Technical Note, Improving the Automated Detection and Analysis of Secure Coding Violations.
Read the SEI Technical Note, Source Code Analysis Laboratory (SCALe).
Read the SEI Technical Report, Source Code Analysis Laboratory (SCALe) for Energy Delivery Systems.
Watch the SEI Webinar, Source Code Analysis Laboratory (SCALe).
Read the SEI Technical Note, Supporting the Use of CERT Secure Coding Standards in DoD Acquisitions.