What
We Do and How We Do it
To
understand how a decompiler works, let's look at a compiler
in a certain way. A compiler can be understood to be simply
a language translator, taking as input the well-formed statements
of one language and applying a set of transformational rules
in order to output statements in a different language (which
we'll refer to as the assembler language of the system).
The authors of the compiler must exhaustively examine all
allowable syntactic variations of each statement permitted
in the source language, and then determine which transformations
must be applied to that statement, to generate one or more
statements in the assembler language. It doesn't matter if
the original statement requires a task to be performed by
the operating system, is doing an arithmetic operation, or
is testing a condition and branching as a result of that test.
The principle remains the same. The authors of the compiler
have only one hard rule: the statements in both source and
assembler languages must be functionally equivalent, that
is, they must be understood as performing the same task. Outside
of that the authors are free to apply any rules they choose.
For example, if a statement such as x = y + (q * n) / y is
allowed, the authors will certainly write an algorithm to
push and pop the needed operations by precedence. But, the
authors may note that, in the case of x = y + z + q, the algorithm
isn't needed, and instead translate it by adding y to x then
z then q. The authors may note that there is no change in
precedence, and so continue that indefinitely, x = y + z +
… + q or may use the algorithm if more than a certain number
of operands are used in the expression.
A decompiler, then, is a language translator that reverses
the source and assembler languages. It takes as input the
assembler language and produces as output the original source
language. The decompiler has an advantage though, in that
it doesn't have to be a full translator, but instead has only
to be concerned with the original rules as applied in the
compiler. A decompiler doesn't have to translate every possible
sequence of instructions, but only those that could have been
produced by the application of the original rules. The development
of the decompiler is based on the discovery of the rules and
then reversing them.
If x = p(y) produces op1op2…opn then op1op2…opn produces x
= p(y). A complete decompiler should be able to be written
by a thorough examination of all allowable statements permitted
in the source language.
The production of a decompiler would seem to be a very simple
process, one that could almost be automated. Simply create
a source with all allowable variations of all statements,
compile it, determine the rules, write the code, and its done.
Unfortunately, there are some real-world constraints. First,
there will be multiple versions of the compiler, within which
the manner in which operations are translated can be changed.
Without the ability to compile source using each of the compiler
versions, the only way to establish which source statement
was used is to examine the assembler instructions and find
a functional equivalent in the source language. Second, post-link
optimizers can compel a second set of transformations to be
performed. Third, although a decompiler may be complete at
any given time, the compiler itself will continue to be changed
and those changes must be kept up with.
By approaching the process of source decompilation as described
above, juggerSoft has been able to write and license decompilers
on four architectures and in five languages. As the authors
of the only licensable decompilers for mainframe and midrange
computers, JuggerSoft's developers are accepted worldwide
as the premier experts in this field.
|