Semantic Equivalence Checking of Decompiled Binaries
Software Engineering Institute
This collaborative effort was informed by work with personnel who have significant experience in software assurance at the DoD and who are familiar with the types of issues that DoD faces when performing software assurance on binary code.
Our tool can identify which decompiled functions are likely to be semantically equivalent to the original binary function and which are unlikely to be equivalent. Our ultimate goal is to make it practical for DoD to use existing source-code static analyzers and repair tools on decompiled code, thereby increasing trust in the software. The figure below shows a pipeline for using our tool in practice. The binary code is decompiled with Ghidra, and the resulting decompiled code is compared for semantic equivalence to the LLVM produced by an independent binary lifter such as RetDec. Only those functions that are semantically equivalent are passed along for static analysis and repair.