Code obfuscation is the technique of applying transformations to a code to make any static analysis (automatic or otherwise) more difficult and more complex. Static analysis groups a series of operations that an analyst can perform on an executable to try to obtain information such as high-level structures and original variables.

Among the techniques of static analysis of software, we can find reverse engineering, also called *reverse engineering*. This technique makes it possible to recover some of the information that allows an analyst to better understand how software works, how it interacts with data, and what algorithms it uses. Obtaining the original code often presents several challenges and pitfalls since machine code does not allow for the expression of loops and high-level data structures.

For some programming languages (such as Java), it is possible to recover the initial code without any problems. It is very likely that by analyzing any Java or .NET-based application, you will be able to find most of the original logic. This is given by the expressiveness of their machine language, such as in Java Bytecode, in which more information can be stored than in the primitive machine language used by the CPU (Assembly).

All reverse engineering tools such as IDA, Ghidra, and Binary Ninja base their operation on certain components such as disassemblers (which allow translation from a raw stream of bytes to machine language instructions) and decompilers (which allow translation from machine language to a high-level C-like language, including information such as strings and loops). These tools have become very powerful over time, and with luck, much of the program logic can be recovered. Decompilers are still approximate; they certainly cannot generate recompilable code; however, through “decompiled” code, it is possible to recreate software.

An example? 3D Pinball Space Cadet is the attempted reconstruction of the popular 3D Pinball game for Windows XP. Analysts were able, through Ghidra and IDA, to reconstruct most of the original data structures, populate them with the correct values, and resurrect the game to port it to different platforms (Nintendo 3DS, AmigaOS, Android). How did they do this? They used a process that mixed reverse engineering and debugging of the original application to be able to figure out how collision management, score updating, and sound management worked.

Reverse engineering allows analysts to go deep with respect to how the software works, what data it relies on, and potentially any weaknesses in the software. A huge Pandora’s box opens here. Some of the information that is included within the software might be restricted to a specific user or might conceal some details that we don’t want to give away to the first person running `strings software_enterprise`

. If we then think about corporate *assets*, the first listed companies are software producers, and paradoxically, software (and algorithms) is their flagship product; they have to protect it.

Imagine that we find within a piece of software a specific sequence of instructions that make the software faster, more performant, or simply better for customers. As a competitor, I might be interested in replicating the same instructions or ideas in my own software to narrow the gap between my software and the competitor’s software. The software might also contain some secrets, such as API keys or internal or private methods, that the originating company might want to hide, which I am interested in as the main adversary.

Very thinly veiled, I introduced some points to motivate the choice of obfuscating the code. Intellectual property protection is one of the main reasons for wanting to obfuscate the code and make static analysis more complex. The algorithms that the company develops to make an application perform better or to make certain choices that have a socio-economic impact are algorithms that constitute a very valuable resource for the attacker. To make analysis more difficult, it is necessary to complicate the information on which decompilers rely: instructions and data. The technique for making any program more complicated is called **obfuscation** and allows a given program P to generate a program P’ that is more difficult to analyze.

In this article, we will introduce POCKET, a program that allows you to have a first approach to obfuscation. Pocket is a command-line program that accepts an MBA (Mixed Boolean Arithmetic) expression, and by applying some transformation rules we are able to make the expression more complex. To better understand how it works and what advantages it has, let us introduce a real-life example.

Once the disassembler has translated the raw byte sequence into assembly microinstructions, the result can look something like this:

```
inc eax
and eax, ebx
and eax, ecx
```

The decompiler can then begin to translate the microinstructions into a simpler language. At first the decompiler works by “pattern matching”: it notes in fact an increment in eax, an and between eax and ebx, and another and between eax and ebx. The commented instructions become thus:

```
inc eax ; eax = eax + 1
and eax, ebx ; eax = eax & ebx
and eax, ecx ; eax = eax & ecx
```

The decompiler notices that in all three expressions `eax`

is the left register used for assignment; the decompiler then rewrites the expressions with a single expression: `eax = ((eax + 1) & ebx) & ecx`

. Assume further that we assign the value `2`

to eax, the value 5 to `ebx`

, and the value `3`

to ecx. The end result of this expression is `3`

. Within a program this value might represent the index of an element, the number of times to iterate within a loop. You might think of it as a nothing value.

But instead of very simple values why can’t we assign a value like `0x80021002`

to eax? The result is `0x80021002`

, however we only need to add a single instruction to make things more interesting. We add `jmp %eax`

. This way, the execution of the next instruction is dynamic: it depends on the value of eax. Thus, as mentioned, we do not worry about the possible values a data item can take.

We are concerned instead with the readability of the expression for a decompiler. The expression `eax = ((eax + 1) & ebx) & ecx`

is very readable: it is 2 simple and operations and a sum. Imagining that it knows the values of eax, ebx, and ecx, the decompiler could even replace eax with the computed value:

```
eax = 0x80021002
eax = ((eax + 1) & ebx) & ecx
// result: 0x80021003
```

The expression we have just analyzed contains algebraic operations (such as sum) and logical operations (AND): therefore, the expression is named Mixed Boolean Arithmetic expression. It is very simple to compute as an expression and requires no special care. How can we make it more inaccessible? For now, let us consider only transformations on the expression.

For example, we know that any sum operation between X and Y `X+Y`

can be transcribed as `(X & Y) + (X | Y)`

. Let us try applying this transformation: in our case X will be eax while Y an immediate value, namely 1. The result of this transformation is `(eax & 1) + (eax | 1)`

and rewriting the expression `eax = (((eax & 1) + (eax | 1) & ebx) & ecx`

. We can then move on to the second transformation involving `X & Y`

. The AND operation between two integers is equivalent to `(X + Y) - (X | Y)`

, in this case `X = (eax & 1) + (eax | 1)`

while `Y = ebx`

: the final result is `(((eax & 1) + (eax | 1)) + ebx) - (((eax & 1) + (eax | 1)) | ebx)`

.

We apply this rule one last time to get the final obfuscated expression:

```
((((((eax & 1) + (eax | 1)) + ebx) - (((eax & 1) + (eax | 1)) | ebx)) + ecx) - (((((eax & 1) + (eax | 1)) + ebx) - (((eax & 1) + (eax | 1)) | ebx)) | ecx))
```

We then understood with a real example the potential of MBA expressions. Obfuscation of these expressions consists of iteratively applying rewriting rules to increase the complexity of these expressions while preserving their semantics. In fact, the expression rewriting rules must not change the result of the expression (because otherwise the obfuscation would be invalid).

The transformation rules are based on certain operations that when performed in a certain order turn out to be equivalent to the original operations.

POCKET is a tool for automatically applying these transformation rules and was written in Rust with the grammar based on Pest. The development of the project was not particularly difficult since I allowed much of the complexity of the parser to be delegated to Pest. The difficulty was more trying to understand Rust’s method of operation, figuring out how to get around some borrowing issues, and an annoying problem with a stack overflow. (Spoiler: I was recursively calling the same operation).

By compiling the project with `cargo compile`

, the `pocket`

binary is generated. We can start it without any problems; it will stand by waiting for an input expression. To resume the expression from before, we can consider: `(A + 1) & B & C`

. POCKET constructs a syntactic tree from the expression that will be traversed to apply the transformation rules. A syntactic tree for the expression in the example is:

```
.
|-- &
|------ +
|--------- A
|--------- 1
|------ &
|-------- B
|-------- C
```

Each node can be one of 4 different types:

**Immediate**: represents an immediate value such as ‘1’**Cont**: represents a SET consisting of a letter of the ASCII alphabet and its value (A, B, C… or a, b, c…)**BinaryOperation**: represents a binary operation. It consists of the contents of the operation (AND, OR, SUM, SUBTract..) and a left and right child. The children are other nodes that define other operations or other terminals.**UnaryOperation**: represents a unary operation. Composed of the symbol before the SET.

Immediate and cont are terminals as they have no children, while BinaryOperation has two children and UnaryOperator one child. Once the tree has been constructed, the tree is visited in its entirety in order to apply the transformation rules: imagine a switch constructed based on the binary operation to be replaced.

- if the operation is
`X & Y`

then apply the substitution`(X + Y) - (X | Y)`

- if the operation is
`X ^ Y`

then apply the substitution`(X | Y) - (X & Y)`

- if the operation is
`X - Y`

then apply the substitution`(X ^ -Y) + 2*(X & -Y)`

- if the operation is
`X + Y`

then apply the substitution`(X & Y) + (X | Y)`

- if the operation is
`X | Y`

then apply the substitution`X + Y + 1 + (~X | ~Y)`

The values of X and Y are replaced each time by the left and right children of the operations, so I can perform the substitution without losing the context I am in. I am working on being able to add additional transformations on the immediate values. If the transformations for binary operations are limited, the transformations for immediate values are potentially infinite: I can write 2 as the difference between 48 and 46, or between 54 and 52 or the difference between 52 and 54 changed sign.

Equivalence transformations result from studies of the compatibility of operators in binary arithmetic. Suppose we have an expression of the type `2 * (x ^ y)`

: this is equivalent to `(2x ^ 2y)`

. The less attentive reader may have confused these two formulas with the simple distributive property: I multiply x and y by two and keep the XOR operation.

**It doesn’t always work that way**!!! Already with `3 * (x ^ y) != (3x ^ 3y)`

the distributive property disappears. Compatibility between formulas and operators is based on the type of set we are considering: the set based on binary arithmetic!

Returning to the example, the result of blurring results as follows:

```
((((((A & 1) + (A | 1)) + B) - (((A & 1) + (A | 1)) | B)) + C) - (((((A & 1) + (A | 1)) + B) - (((A & 1) + (A | 1)) | B)) | C))
```

While if you wanted to have another level of blurring starting from the expression from before, here is the result:

```
((((((((((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 +
(1 + ((~A) | (~1))))))) & B) + (((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1)
- (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) | B)) ^ (-(((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A
) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) + (B + (1 + ((~((((A + 1) -
(A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))
)) | (~B))))))) + (2 * (((((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A |
1)) | (A + (1 + (1 + ((~A) | (~1))))))) & B) + (((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)
))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) | B)) & (-(((((A + 1) - (A | 1)) &
A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) + (B + (1
+ ((~((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1
+ ((~A) | (~1)))))))) | (~B))))))))) & C) + (((((((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)
)))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) & B) + (((((A + 1) - (A | 1)) & (A +
(1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) | B)) ^ (-(((
(A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) |
(~1))))))) + (B + (1 + ((~((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) -
(A | 1)) | (A + (1 + (1 + ((~A) | (~1)))))))) | (~B))))))) + (2 * (((((((A + 1) - (A | 1)) & (A +
(1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) & B) + ((
(((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + (
(~A) | (~1))))))) | B)) & (-(((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A
+ 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) + (B + (1 + ((~((((A + 1) - (A | 1)) & (A + (
1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1)))))))) | (~B)
)))))))) | C)) ^ (-(((((((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (
A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) & B) + (((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A)
| (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) | B)) ^ (-(((((A + 1)
- (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1
))))))) + (B + (1 + ((~((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) -
(A | 1)) | (A + (1 + (1 + ((~A) | (~1)))))))) | (~B))))))) + (2 * (((((((A + 1) - (A | 1)) & (A + (1
+ (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) & B) + (((((A + 1
) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1)
)))))) | B)) & (-(((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) |
(A + (1 + (1 + ((~A) | (~1))))))) + (B + (1 + ((~((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)
)))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1)))))))) | (~B))))))))) + (C + (1 + ((~(((((
((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A)
| (~1))))))) & B) + (((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1))
| (A + (1 + (1 + ((~A) | (~1))))))) | B)) ^ (-(((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))
)) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) + (B + (1 + ((~((((A + 1) - (A | 1)) &
(A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1)))))))) | (~B))
)))) + (2 * (((((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A |
(1 + (1 + ((~A) | (~1))))))) & B) + (((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A
+ 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) | B)) & (-(((((A + 1) - (A | 1)) & (A + (1 + (1
+ ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) + (B + (1 + ((~((((A +
1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) |
(~1)))))))) | (~B)))))))))) | (~C))))))) + (2 * (((((((((((A + 1) - (A | 1)) & (A + (1 + (1 +
((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) & B) + (((((A +
1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((
~A) | (~1))))))) | B)) ^ (-(((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (
A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) + (B + (1 + ((~((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~
A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1)))))))) | (~B))))))) + (2 * ((
(((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~
A) | (~1))))))) & B) + (((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (
A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) | B)) & (-(((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) |
(~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) + (B + (1 + ((~((((A + 1) - (A
| 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1)))))
))) | (~B))))))))) & C) + (((((((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1)
- (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) & B) + (((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) |
(~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) | B)) ^ (-(((((A + 1) - (A | 1)
) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) + (
B + (1 + ((~((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (
1 + (1 + ((~A) | (~1)))))))) | (~B))))))) + (2 * (((((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A)
| (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) & B) + (((((A + 1) - (A | 1))
& (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) | B))
& (-(((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (
1 + ((~A) | (~1))))))) + (B + (1 + ((~((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) +
(((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1)))))))) | (~B))))))))) | C)) & (-(((((((((A + 1
) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))
))))) & B) + (((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (
A + (1 + (1 + ((~A) | (~1))))))) | B)) ^ (-(((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))
))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) + (B + (1 + ((~((((A + 1) - (
A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1)))))))
) | (~B))))))) + (2 * (((((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A |
1)) | (A + (1 + (1 + ((~A) | (~1))))))) & B) + (((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))
))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) | B)) & (-(((((A + 1) - (A | 1)) & (A
+ (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) + (B + (1
+ ((~((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1
+ ((~A) | (~1)))))))) | (~B))))))))) + (C + (1 + ((~((((((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A)
| (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) & B) + (((((A + 1) - (A | 1))
& (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) | B))
^ (-(((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1
((~A) | (~1))))))) + (B + (1 + ((~((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((
A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1)))))))) | (~B))))))) + (2 * (((((((A + 1) - (
A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1)))))))
& B) + (((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 +
(1 + ((~A) | (~1))))))) | B)) & (-(((((A + 1) - (A | 1)) & (A + (1 + (1 + ((~A) | (~1)))))) + (((
A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1))))))) + (B + (1 + ((~((((A + 1) - (A | 1)) & (A +
(1 + (1 + ((~A) | (~1)))))) + (((A + 1) - (A | 1)) | (A + (1 + (1 + ((~A) | (~1)))))))) | (~B)))
))))))) | (~C)))))))))
```

As we have specified several times, the transformation rules can continue to be applied indefinitely. It is clear that already at the third level of obfuscation the generated expressions are very heavy to visualize. So we ask: How can we actually verify that the transformations have succeeded?

The central dogma ~~of biology~~ of obfuscation reports “all transformations must be applied IF AND ONLY IF the transformation does not change the original result.” More elegantly, we can then say that an obfuscated expression is equivalent to an expression if and only if the obfuscated expression, correctly evaluated, returns the same result as the original expression. To verify the rewrites therefore, I wrote a simple interpreter.

```
fn eval(&self, node: &Node) -> i32{
match node {
Node::Cont(_set, value) => *value,
Node::Immediate(value) => *value,
Node::UnaryExpr { op, child } => {
let child = self.eval(child);
return match op{
UnaryOperator::Not => !child,
UnaryOperator::Minus => -child,
}
},
Node::BinaryExpr { op, lhs, rhs } => {
let left_value = self.eval(lhs);
let right_value = self.eval(rhs);
match op{
Operator::And => left_value & right_value,
Operator::Or => left_value | right_value,
Operator::Xor => left_value ^ right_value,
Operator::Plus => left_value + right_value,
Operator::Minus=> left_value - right_value,
Operator::Mul => left_value * right_value,
}
},
}
}
```

The `eval`

function allows us to evaluate an expression consisting of binary operations, unary operations, immediate values, and sets. Since we find the same operations implemented in Rust the only concern we want to address is on the evaluation of the left and right node. For example taking the usual example `(A + 1) & B & C`

: to perform the AND operation of `(A + 1) & B`

, we first want to visit the right child subtree and the left child subtree.

For the right child, the value is basically given by the value contained within the node (*value is returned from `eval`

). For the left child, on the other hand, it is necessary to evaluate `A + 1`

. So we call for both the `eval`

function for `lhs`

and for `rhs`

, then perform the sum operation. The question that may arise spontaneously concerns the number contained within the SET node. If POCKET accepts an expression, how does it determine the number for each letter?

We do not use random values; to ensure determinism between blurring steps, we assign for now the number of the ASCII character representing the set. For example, `A`

becomes 65, `B`

66, and so on. In the very near future, I will implement a key-value table that will contain a random number for each set. The table is maintained for all obfuscation steps. This way we can verify that the obfuscated expression allows the same value to be calculated as the original expression. All that would be missing is the transformation to assembly code, and then POCKET could really be implemented as a rewriter of binaries.

MBA expressions turned out to be very interesting! The goal of POCKET was to develop a project with Rust that had any utility in the software engineering world. I do not consider POCKET to be mature or complete software, but just a weekend project to better learn how you can manipulate an AST, how to build an interpreter in Rust, and how you can replace some nodes. It is possible to improve the project by including more rules and acting on some edge cases that counterattack programs could eliminate.

The reference paper for taking a more rigorous and formal approach is **Information Hiding in Software with Mixed Boolean-Arithmetic Transforms** by Yongxin Zhou, Alec Main, Yuan X. Gu and Harold Johnson. Recommended reading in order to formally describe MBA expressions!

Can it be said that these obfuscation methods can be safe from counterattacks? Absolutely not. Industry and academic research has produced several papers that include increasingly sophisticated methods of resolving obfuscated expressions. We would tend to want the transformation from original expression to obfuscated expression to be irreversible. However, since the rules of transformations are well known in the literature, it is possible to develop several programs that can resolve and “summarize” obfuscated expressions.

In the next article, we will mention for good possible counterattacks and related papers against MBA obfuscation. Suffice it to say that LLVM has a part of code that serves precisely to condense any obfuscated expressions by pattern matching. Here is an excerpt:

```
// A & (A ^ B) --> A & ~B
if (match(Op1, m_OneUse(m_c_Xor(m_Specific(Op0), m_Value(B)))))
return BinaryOperator::CreateAnd(Op0, Builder.CreateNot(B));
// (A ^ B) & A --> A & ~B
if (match(Op0, m_OneUse(m_c_Xor(m_Specific(Op1), m_Value(B)))))
return BinaryOperator::CreateAnd(Op1, Builder.CreateNot(B));
```

We will have to try to make POCKET immune to these counter attacks. We will implement new operations to make life more complex for programs trying to rewrite obfuscated expressions into simpler expressions. If you have any questions or curiosity about the POCKET project, or just want to report a refusal/error, please contact me at seekbytes@protonmail.com.