JuggerSoft - Decompilers

To understand how a decompiler works, let's look at a compiler in a certain way. A compiler can be understood to be simply a language translator, taking as input the well-formed statements of one language and applying a set of transformational rules in order to output statements in a different language (which we'll refer to as the assembler language of the system).

The authors of the compiler must exhaustively examine all allowable syntactic variations of each statement permitted in the source language, and then determine which transformations must be applied to that statement, to generate one or more statements in the assembler language. It doesn't matter if the original statement requires a task to be performed by the operating system, is doing an arithmetic operation, or is testing a condition and branching as a result of that test. The principle remains the same. The authors of the compiler have only one hard rule: the statements in both source and assembler languages must be functionally equivalent, that is, they must be understood as performing the same task. Outside of that the authors are free to apply any rules they choose.

For example, if a statement such as x = y + (q * n) / y is allowed, the authors will certainly write an algorithm to push and pop the needed operations by precedence. But, the authors may note that, in the case of x = y + z + q, the algorithm isn't needed, and instead translate it by adding y to x then z then q. The authors may note that there is no change in precedence, and so continue that indefinitely, x = y + z + … + q or may use the algorithm if more than a certain number of operands are used in the expression.

A decompiler, then, is a language translator that reverses the source and assembler languages. It takes as input the assembler language and produces as output the original source language. The decompiler has an advantage though, in that it doesn't have to be a full translator, but instead has only to be concerned with the original rules as applied in the compiler. A decompiler doesn't have to translate every possible sequence of instructions, but only those that could have been produced by the application of the original rules. The development of the decompiler is based on the discovery of the rules and then reversing them.

If x = p(y) produces op1op2…opn then op1op2…opn produces x = p(y). A complete decompiler should be able to be written by a thorough examination of all allowable statements permitted in the source language.

The production of a decompiler would seem to be a very simple process, one that could almost be automated. Simply create a source with all allowable variations of all statements, compile it, determine the rules, write the code, and its done. Unfortunately, there are some real-world constraints. First, there will be multiple versions of the compiler, within which the manner in which operations are translated can be changed. Without the ability to compile source using each of the compiler versions, the only way to establish which source statement was used is to examine the assembler instructions and find a functional equivalent in the source language. Second, post-link optimizers can compel a second set of transformations to be performed. Third, although a decompiler may be complete at any given time, the compiler itself will continue to be changed and those changes must be kept up with.

By approaching the process of source decompilation as described above, juggerSoft has been able to write and license decompilers on four architectures and in five languages. As the authors of the only licensable decompilers for mainframe and midrange computers, JuggerSoft's developers are accepted worldwide as the premier experts in this field.