An Application Programming Interface for Classifying and Prioritizing Static Analysis Alerts
PUBLISHED IN
Secure DevelopmentIn this post, we describe the Source Code Analysis Integrated Framework Environment (SCAIFE) application programming interface (API). SCAIFE is an architecture for classifying and prioritizing static analysis alerts. It is designed so that a wide variety of static analysis tools can integrate with the SCAIFE system using the API. The API is pertinent to organizations that develop or research static analysis alert auditing tools, aggregators, and frameworks.
Benefits of SCAIFE
The planned SCAIFE system will provide an architecture with APIs and an open-source prototype system that has the following benefits to users:
- Analysts with no knowledge of machine learning can quickly start to use automated classifiers for static analysis alerts. The classification system will not require a labeled audit archive to be provided in advance, since it uses test suites in a new way. It also will not require users to create their own frameworks to use the classifiers.
- Analysts and organizations can quickly apply formulas that prioritize static analysis alerts by using factors they care about. These prioritization formulas can combine various fields, including classifier-derived confidence, with mathematical operators.
- Developers and researchers can employ the API definition to build upon the original prototype system, enabling the use of additional flaw-finding static analysis tools, code metrics tools (such as CCSM or lizard), adaptive heuristics, classification techniques, and so forth.
Why Read About the Beta API Definition?
We have provided this beta version of the SCAIFE API (version 0.0.2) so developers can estimate development effort that would be required to modify their tools to make and respond to SCAIFE API calls. Representatives from multiple organizations have communicated that they want to understand what level of investment would be required to make their tools capable of using classifiers and advanced prioritization with SCAIFE. We also want this beta API definition to generate feedback from developers and organizations interested in implementing the SCAIFE API, so that when we release V1.0.0 publicly, it will be easy for developers of a wide variety of static analysis software to use.
Modifications between SCAIFE API versions 0.0.1 and 0.0.2 include adding a registration server, as well as adding and modifying many API calls and their associated data models. After examining APIs for the Department of Homeland Security's Software Assurance Marketplace (SWAMP) and the U.S. Army's Combat Capabilities Development Command (CCDC) C5ISR Center's Software Assurance Tool (SwAT), we also added several fields to SCAIFE API v0.0.2 to enable easier future integration of those tools with SCAIFE. Other modifications were based in part on feedback received by organizations that are interested in integrating their software with SCAIFE via API calls.
While completing SCAIFE API version 1.0.0, the SCAIFE development team is simultaneously completing a prototype instantiation of the architecture, a multi-server software system whose servers communicate using SCAIFE API calls. The SCAIFE prototype is intended to be used by engineers to audit alerts from static analysis tools with a graphical user interface (GUI) front end. The audited archive data is subsequently stored in the system's databases. The SCAIFE prototype's back-end system also supports automated alert classification (true and false) and advanced alert prioritization based on mathematical, user-defined formulas.
The SCAIFE prototype includes the latest version of SCALe, the SEI-developed alert auditing framework that provides a GUI front-end for examining code and marking determinations (e.g., true or false), and a back end that stores audit data in a database archive. We have modified SCALe to facilitate integration with SCAIFE, and to include features for alert prioritization using mathematical formulas, automated alert classification, and other SCAIFE functionality. The latest version of SCALe includes modifications to enable different modes of operation: SCAIFE-connected, SCALe-only, and Demo modes. This SCAIFE prototype can be used as is, or developers can choose to either swap out or modify particular servers.
Although the SCAIFE architecture shown in Figure 1 includes five servers, the API definition published in our recent white paper, SCAIFE API Definition Beta Version 0.0.2 for Developers, has only four sections, which describe API function calls and responses by four of the servers, but not the User-Interface (UI) Module, since the other servers do not make API calls to the UI Module. Calls from the UI Module to the other servers are listed in each of the four sections.
The UI Module represents existing analysis tools that display alert data in a GUI front end--including tool aggregators such as SCALe, SWAMP, and SwAT--and will instantiate API calls to the other four servers. Each API definition section below is further categorized based on the source and destination modules of the API calls. For instance, the Registration and Login Module API definition section contains only one category of API calls under the label UIToRegistration. The source (request) of the API calls comes from the UI Module, and the API calls are forwarded to the destination--the Registration Module.
We developed the API definition using the Swagger/OpenAPI open-source development toolset. We chose this toolset because it is widely used (approximately 10,000 downloads daily), and provides automated code generation from API specifications and testing support. These toolset features not only support SEI development of the SCAIFE API and the prototype instantiation of the SCAIFE architecture, but they also support other developers' work to instantiate the SCAIFE API within their own tools.
API Definition YAML File
We published a YAML-formatted file specifying the SCAIFE API, available at the CMU-SEI GitHub site "SCAIFE API" for free downloads by the public. The YAML specification provides the SCAIFE API definition beta version 0.0.2, in a format that developers can easily use to view, modify, and automatically generate code (e.g., with the Swagger Editor and Swagger Codegen tools). The YAML file was almost entirely created manually by SEI developers. The only entries that were auto-generated by Swagger tools within the YAML file are the examples.
The API Definition in the Paper and How to Use It
The SCAIFE API definition is provided in the paper, in text originally generated by SEI developers in YAML. We used the Swagger Codegen tool to produce an HTML version of the API documentation that we copied to the paper, and then slightly modified the original output format to improve readability. The API version included in the paper is more accessible to readers with diverse job titles and technical capabilities, since it does not require familiarity with YAML format, nor the installation of additional software (e.g., Swagger Editor) to facilitate viewing.
You can view the interface methods in two ways. If you are interested in a particular module, click on the hyperlink for that module's API Definition, which will take you to the API calls for that module. You can also find an API call directly by using the links in the Summary of API Methods section.
For the interface (PUT /projects/{project_id}/{package_id}/alerts) in the DataHubToStats section, you can start by clicking on the Rapid Models Statistics Module API Definition link, or by clicking on the PUT /projects/{project_id}/{package_id}/alerts link under the list of statistics methods. For this example, both routes take you to the API call definition.
The PUT request (the /projects/{project_id}/{package_id}/alerts API call) in the DataHubToStats section is used to forward new alerts from the DataHub to the Stats module. As you can see, this method expects two parameters in the URL path, denoted by the curly brackets around the project_id and package_id variables, and specified under the Path parameters subheading. All API calls for this architecture accept and return Javascript Object Notation (JSON) objects, which are defined under the Consumes and Produces keywords.
The request body of this particular API call expects a multiple_alerts object. To identify the format for multiple alerts, click on the hyperlink, which will direct you to the model definition. Here you will see that the multiple alerts object can contain an array of meta_alert objects and/or an array of alert objects. Clicking on the meta_alert link redirects you to the meta_alert object's definition, which is as follows:
meta_alert -
meta_alert_id
String
alert_ids (optional)
array[String]
filepath (optional)
String
line_start (optional)
Integer
condition_id
String
determinations (optional)
determination
verdict (optional)
map[String, array[String]]
A meta-alert object also contains additional embedded objects, determinations, which can be similarly accessed. To return the top level of a section, you can use the Up hyperlink. From the meta_alert object, clicking Up will take you to the beginning of the Summary of API Models section. From here, to return to the list of API calls, you can click on the Jump to Methods hyperlink. Here, you can explore the path for another API call or take a similar route to find other object formats.
Background
Static analysis tools analyze code without executing it to identify potential flaws in source code. These tools produce a large number of alerts with high false-positive rates that engineers must painstakingly examine to find legitimate flaws. As described in the first blog post in this series, we in the SEI's CERT Division have been developing the SCALe (Source Code Analysis Laboratory) tool since 2010 as part of our research on new ways to help analysts be more efficient and effective at auditing static analysis alerts.
In August 2018, we released SCALe 2.0.0 to the public as an open source project via Github. In the second blog post in this series, we described new features and capabilities we have added in our ongoing development of SCALe. A full description of these features and capabilities can be found in our recently published technical report, Integration of Automated Static Analysis Alert Classification and Prioritization with Auditing Tools: Special Focus on SCALe. In the report, we described plans to connect this enhanced version of SCALe to an architecture that will provide classification and prioritization of alerts via API calls. The report also provided the first beta version of the API definition (version 0.0.1, from September 2018).
Looking Ahead
Compared to the beta API definitions, the published SCAIFE API v1.0.0 definition will include implementation details, the architecture description, motivations, and a prototype system. We also plan to modify the auto-generated HTML format of the API definition to increase usability (e.g., modifying color, boldness, and text indentation to clarify heading level and object type).
Future versions of the SCAIFE API will also incorporate responses to feedback from API reviewers, prototype testers, API implementers, and readers of this blog post. We are coordinating SCAIFE version 1.0.0 API publication with release of the system prototype. Future iterations of this API will also
- include enhanced functionality related to adaptive heuristics
- provide additional API calls that return a wider variety of performance metrics to help non-experts make better choices
- add functions that support business continuity in the event of partial system failure (e.g., server failure or data loss)
We will initially distribute the prototype to research-project collaborators who will test it and provide feedback. We invite readers of the white paper and this blog post who are interested in testing the SCAIFE prototype to contact us.
Additional Resources
Read the SEI white paper, SCAIFE API Definition Beta Version 0.0.2 for Developers
Examine the YAML specification of the SCAIFE API Beta Version 0.0.2
Read the SEI technical report, Integration of Automated Static Analysis Alert Classification and Prioritization with Auditing Tools: Special Focus on SCALe.
Read the blog posts in this series of posts on SCALe.
Watch the SEI webinar, Improve Your Static Analysis Audits Using CERT SCALe's New Features.
Read SEI press release, SEI CERT Division Releases Downloadable Source Code Analysis Tool.
Read the SEI blog post, Test Suites as a Source of Training Data for Static Analysis Alert Classifiers.
Read the Software QUAlities and their Dependencies (SQUADE, ICSE 2018 workshop) paper, Prioritizing Alerts from Multiple Static Analysis Tools, Using Classification Models.
Read the SEI blog post, Prioritizing Security Alerts: A DoD Case Study. (In addition to discussing other new SCALe features, it details how the audit archive sanitizer works.)
View the presentation, Challenges and Progress: Automating Static Analysis Alert Handling with Machine Learning.
View the presentation (PowerPoint), Hands-On Tutorial: Auditing Static Analysis Alerts Using a Lexicon and Rules.
Watch the video, SEI Cyber Minute: Code Flaw Alert Classification.
View the presentation, Rapid Expansion of Classification Models to Prioritize Static Analysis Alerts for C.
View the presentation, Prioritizing Alerts from Static Analysis with Classification Models.
Look at the SEI webpage focused on our research on static analysis alert automated classification and prioritization.
Read the SEI paper, Static Analysis Alert Audits: Lexicon & Rules, presented at the IEEE Cybersecurity Development Conference (IEEE SecDev), which took place in Boston, MA on November 3-4, 2016.
Read the SEI paper, SCALe Analysis of JasPer Codebase.
Read the SEI technical note, Improving the Automated Detection and Analysis of Secure Coding Violations.
Read the SEI technical note, Source Code Analysis Laboratory (SCALe).
Read the SEI technical report, Source Code Analysis Laboratory (SCALe) for Energy Delivery Systems.
Read the SEI technical note, Supporting the Use of CERT Secure Coding Standards in DoD Acquisitions.
More By The Authors
Release of SCAIFE System Version 2.0.0 Provides Support for Continuous-Integration (CI) Systems
• By Lori Flynn
PUBLISHED IN
Secure DevelopmentGet updates on our latest work.
Sign up to have the latest post sent to your inbox weekly.
Subscribe Get our RSS feedGet updates on our latest work.
Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.
Subscribe Get our RSS feed