Two Tools for Malware Analysis and Reverse Engineering in Ghidra

Since the public release by the National Security Agency of the software reverse engineering (SRE) tool suite Ghidra, our team of researchers at the SEI’s CERT Division have been working to create a new suite of tools to make it easier for analysts to take advantage of Ghirdra’s capabilities and interface. This new suite of tools, known as Kaiju, helps malware analysis and reverse engineering take advantage of Ghidra’s capabilities and interface, which are relatively easy to use during malware analysis. The tools included with Kaiju give malware analysts many advantages as they are faced with increasingly diverse and complex malware threats.

One of the more complex plugins included in Kaiju is a Satisfiability Modulo Theories (SMT)-based path analysis tool named GhiHorn. In this post we present and discuss the use of two Ghihorn tools: API Analyzer and Path Analyzer. Both of these tools enable analysts to find paths in executables based on numerous criteria, such as user-specified start and end addresses or passing through specific program points. Our previous work has shown that executable path finding can support various malware analysis activities. One example that we have previously cited is a malware program that contains a check for the presence of a debugger, a common technique meant to hinder analysis. The analyst may wish to know if there is a viable execution path that circumvents this check and, if there is a path, what inputs and environmental conditions are needed to traverse it. Both PathAnalyzer and ApiAnalyzer can be used to address and solve these types of problems.

GhiHorn Tools: ApiAnalyzer and PathAnalyzer

On top of the GhiHorn platform we have implemented two path analysis tools: PathAnalyzer and ApiAnalyzer. Both tools use Horn encoding and solving as their primary strategy for path analysis, and both emphasize the usefulness of results.

PathAnalyzer

PathAnalyzer allows an analyst to determine if a path exists between two locations in a program and, if so, what values are required to take that path. PathAnalyzer integrates with Ghidra as a plugin (Figure 1). An analyst must provide start and goal addresses to seed the analysis. If a path is feasible, PathAnalyzer returns an answer in the form of a graph containing information about the recovered path.

Figure 1: PathAnalyzer user interface

An example of a recovered path is shown in Figure 2. Each vertex in the graph represents a basic block traversed. GhiHorn operates on live variables present in each basic block, with an emphasis on variables that are linked to decompilation data structures. The variable values in the vertices are the values that Z3 selected to reach the goal address. The graph is clickable in the Ghidra user interface, meaning that selecting a node will highlight the relevant decompilation and disassembly instructions for the answer vertex.

Figure 2: PathAnalyzer answer graph

If no path is found, then a counterexample graph is generated showing how the query failed. In this version of GhiHorn the counterexample is not very meaningful. Currently, the actual results for unsatisfiable queries are presented as they are returned by Z3: as a graph with Boolean values assigned to states found in the encoding. For example, Table 1 shows the disassembly for a function. Attempting to find a path from the beginning of the function at address 0x100003f50 to the address 0x100003f9d is unsatisfiable because local_1c is always an even number, and the condition guarding the instruction at 0x100003f9d checks for an odd number. The disassembly is shown because the default result for an unsatisfiable answer is to return Boolean values (true or false) for block addresses. Aside from learning that this particular attempt to find a path is infeasible, we are still working to make counterexamples more useful.

Table 1: Unsatisfiable example

ApiAnalyzer

The second tool based on GhiHorn, named ApiAnalyzer, uses binary path and data flow analysis to reason about program behaviors. Like its Pharos namesake, the premise of GhiHorn ApiAnalyzer is that interesting program behaviors can be determined by finding API function call sequences that share common data. For example, Figure 3 below shows the common way to list the processes running on a system using the Windows Tool Help Functions. The API calls used are highlighted in red.

Figure 3: Source code to iterate through running processes using API calls to Windows Tool Help Functions

The GhiHorn version of ApiAnalyzer formulates a program using the same hornified control flow graph as PathAnalyzer. This approach is reasonable given that the API analysis problem can be framed as a path reachability problem with a few additional constraints namely that the path must traverse a specific sequence of API function calls. More specifically the additional constraints are introduced to tally the API function calls covered and to track variable values passed between API functions.

The API call sequences and variables to track are specified as signatures using a JSON-based format similar to that of the Pharos ApiAnalyzer. For example, a simple signature to link the opening and closing of the same file is shown in Table 2. Note that the two functions operate on the same data: a variable labeled HANDLE. CreateFileA returns the initial value, and CloseHandle must operate on that same value for a match to exist.

Table 2 : API signature

The ApiAnalyzer plugin is shown in Figure 4. In this image, two simple signatures are being searched for: reading and writing the same file, as well as opening and closing the same file. Both signatures are found in the program, and an answer graph is presented. This graph is shown in Figure 5.

Figure 4: Ghidra ApiAnalyzer user interface

Figure 5: ApiAnalyzer answer graph

ApiAnalyzer’s answer graph is basically the same as the graph generated by PathAnalyzer. Imported API functions for which there is an implementation are highlighted in cyan. API functions for which there is no implementation are colored magenta. As before, the graph includes variable values that are used to construct the path, and the graph can be used for navigation in the Ghidra disassembly and decompiler windows.

The Future of GhiHorn

GhiHorn provides the capability to reason about program paths in Ghidra. The Horn-based tools provided by GhiHorn and the tight integration with Ghidra’s user interface makes formal program analysis tools accessible to malware analysts and reverse engineers. Future versions of the plugin will continue to provide new and better ways to consume results and to model more complex code structures. We also plan on generating better counterexamples to make it more evident why a path could not be recovered. Finally, we’re working on a better memory model that better represents a real program’s address space.

GhiHorn is publicly available as part of the CERT Kaiju Framework. The source code and build instructions for GhiHorn (and all Kaiju tools) are available on GitHub. We welcome suggestions for improvements or for new utilities that would be useful for building new tools to support malware analysis and reverse engineering.

Software Engineering Institute

SEI Blog