Using OOAnalyzer to Reverse Engineer Object Oriented Code with Ghidra
Object-oriented programs continue to pose many challenges for reverse engineers and malware analysts. C++ classes tend to result in complex arrangements of assembly instructions and sophisticated data structures that are hard to analyze at the machine code level. We've long sought to simplify the process of reverse engineering object-oriented code by creating tools, such as OOAnalyzer, which automatically recovers C++-style classes from executables.
OOAnalyzer includes utilities to import OOAnalyzer results into other reverse engineering frameworks, such as the IDA Pro Disassembler. I'm pleased to announce that we've updated our Pharos Binary Analysis Framework in Github to include a new plugin to import OOAnalyzer analysis into the recently released Ghidra software reverse engineering (SRE) tool suite. In this post, I will explain how to use this new OOAnalyzer Ghidra Plugin to import C++ class information into Ghidra and interpret results in the Ghidra SRE framework.
The Ghidra SRE tool suite was publicly released by the National Security Agency. This framework provides many useful reverse engineering services, including disassembly, function partitioning, decompilation, and various other types of program analyses. Ghidra is open source and designed to be easily extendable via plugins. We have been exploring ways to enhance Ghidra analysis with the Pharos reverse engineering output, and the OOAnalyzer Ghidra Plugin is our first tool to work with Ghidra.
OOAnalyzer Pharos recovers C++-style classes from executables by generating and solving constraints with XSB Prolog. Among the information it recovers are class definitions, virtual function call information, and class relationships such as inheritance and composition. A complete description of the OOAnalyzer reasoning system is available in our paper: Using Logic Programming to Recover C++ Classes and Methods from Compiled Executables, which was presented at ACM Computer and Communication Security (CCS) 2018. OOAnalyzer produces a JSON file with information on recovered C++ classes.
The OOAnalyzer Ghidra Plugin
We recognized early on that Pharos tools would be more useful to analysts if they integrated with other reverse engineering frameworks. Thus, we traditionally imported OOAnalyzer Pharos output in to the IDA Pro Disassembler via our OOAnalyzer IDA Plugin. The new OOAnalyzer Ghidra plugin is a standard Ghidra extension that can load, parse, and apply OOAnalyzer Pharos results to object oriented C++ executables in a Ghidra project. The plugin is accessible in Ghidra via a new CERT menu, as shown in Figure 1. When launched, the plugin will prompt for a JSON file produced by OOAnalyzer Pharos analyzing the same executable. It provides options for organizing recovered C++ data structures (more on this below). Upon loading the JSON file, types and symbols are updated in Ghidra to reflect C++ data structures found by OOAnalyzer Pharos.
Figure 1: Launching the CERT OOAnalyzer Ghidra Plugin
Representing C++ Data Structures in Ghidra
C++ classes generally include methods and members. Ghidra displays these components to an analyst through a combination of the symbol tree, where program symbol information is stored, and the data type manager, where data type information is stored. Combined, these two components enable the viewing of recovered C++ data structures in Ghidra.
A before-and-after snapshot of the Ghidra symbol tree is shown in Figure 2. On the left side is the information for Cls1 prior to importing OOAnalyzer Pharos analysis. The Cls1 component already contains some class information, such as run time type information (RTTI). The OOAnalyzer Ghidra plugin updates this information to include found methods, such as constructors, destructors, and virtual functions found by OOAnalyzer Pharos. For example, the right side of Figure 2 shows the OOAnalyzer Ghidra plugin was able to import information about a constructor method, labeled
Cls1::Cls1 as it would be in C++ by convention, and a virtual function named
Figure 2: Ghidra symbol tree prior to, and after OOAnalyzer updates are applied.
The symbol tree applies names and labels to a disassembly and decompilation listing. Symbols are organized into different groups including classes and functions. Once OOAnalyzer symbols are added to the tree, they are automatically recognized and applied by Ghidra. We find this especially useful in decompilation. Consider the methods decompiled by Ghidra in Figures 3 and 4. Prior to importing OOAnalyzer Pharos results, Ghidra does not know that method
FUN_00401150 is a constructor for Cls1. After this information is added to the symbol table, the OOAnalyzer Ghidra plugin uses it to correct the calling convention (
__fastcall is changed to
__thiscall), update the return type (
void is changed to
Cls1*), and fix the function parameters (
undefined4 *param_1 is changed to
Figure 3: Class method prior to symbol tree update.
Figure 4: Class method after symbol tree update.
The symbol tree does not contain any information about type definitions. The complete specifications for recovered C++ data structure types are inserted in the Ghidra data type manager, which is shown in Figure 5, where well-defined type information is stored in Ghidra.
Figure 5: Ghidra data type manager with OOAnalyzer type information imported.
Note that new and updated types are organized into a directory named OOAnalyzer. This arrangement allows analysts to understand exactly which types have been updated via the OOAnalyzer plugin (more on this below). There are potentially two structures created or updated for each C++ class imported via OOAnalyzer: a C++ class type structure to contain class members, and one or more class virtual function table structures to hold virtual function information. Class members may include traditional, primitive type members, complex types (such as other classes), and parent classes. Figure 6 shows the definitions for a C++ class type, a primitive member (
mbr_50) and two parents (
Cls2). The best way to handle parents is to treat them as implicit class members where an entire copy of each parent is embedded in the child object.
Figure 6: Ghidra structure editor for OOAnalyzer-recovered C++ class.
The original OOAnalyzer plugin was designed with IDA Pro in mind. Ghidra has many similar--but some different--features to consider when applying OOAnalyzer results. The representation we chose for C++ objects in Ghidra is a work in progress. We continue to explore how the features available in Ghidra can work with OOAnalyzer. In particular, the way that Ghidra handles decompilation in the presence of well-defined C++ data structures that may be bound dynamically, such as virtual function pointers, requires more study. In the meantime, we think the plugin can be useful for reverse engineers and malware analysts. The following subsections describe other design decisions that we made in the OOAnalyzer Ghidra plugin.
Incorporating Ghidra-Defined Types
Ghidra includes a fairly complete set of types for standard and well-known data structures, such as types in the standard namespace. Ghidra also has an analysis pass defined to recover and apply RTTI. The presence of these types prior to importing OOAnalyzer information is welcome in the sense that it provides more information for analysis; however, the OOAnalyzer plugin must take care to determine the best way to combine with the type information and new insights from Pharos. Rather than discard the Ghidra-provided information, the plugin evaluates and merges it with information generated from OOAnalyzer Pharos to produce a more complete type definition. The comparison and combination is based on many factors, including a number of members defined and data type size.
Class Usages and Virtual Function Calls
Ghidra's decompiler automatically incorporates type information into its analysis. This feature makes the explicit application of structure types, which was required by IDA Pro, unnecessary. For example, consider the virtual function calls shown in Figure 7. On the left is the disassembly, and on the right is the decompilation. Ghidra was able to automatically determine which class and virtual function table structure to apply by incorporating the defined types into decompilation.
Figure 7: Virtual function calls in Ghidra
We are able to create this representation by adding new types in Ghidra structures that represent virtual function tables that include "members" to represent virtual functions. As noted above, we are still working on the best way to represent these relationships given that virtual function table types are bound to object pointers at runtime.
Organizing Changes in the OOAnalyzer Namespace
The last notable feature of the OOAnalyzer Ghidra plugin is its ability to add all types created or updated by the plugin to a special OOAnalyzer namespace in the Ghidra symbol tree. Ghidra uses namespaces to organize symbols and define scope. For example, symbols that are taken from the "
std" C++ namespace are placed in the "std" Ghidra namespace by default. The OOAnalyzer Ghidra plugin moves all updated symbols and types to a new namespace named OOAnalyzer. This restructuring makes it easy to identify what was updated by the plugin. If this organization is not preferred, it can be disabled when the plugin is loaded.
Ghidra is a compelling new tool for reverse engineers and malware analysts. It provides many interesting new features that we are still working to understand and determine how to best leverage with the Pharos Binary Analysis Framework. Be sure to keep an eye on the Pharos GitHub repository and the SEI Blog for the latest updates to our work.
- Ghidra Framework: https://ghidra-sre.org/
- Ghidra source code on Github: https://github.com/NationalSecurityAgency/ghidra
- In-depth description of OOAnalyzer: Using Logic Programming to Recover C++ Classes and Methods from Compiled Executables: https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=539759
- SEI blog posts on the Pharos framework: https://insights.sei.cmu.edu/searchresults.html#stq=pharos&stp=1