The flow control graph is an important building block in static program analysis for applying a variety of analyses that consider the flow of a program. The “flow”, described informally, represents the evolution of the program during execution time, that is, at which the CPU jumps in order to continue program execution.
The graph (also called CFG for brevity) allows the first high-level elements of software to be derived in a general way from a low- or medium-level representation (we cite as an example assembly code or intermediate code). High-level elements include high-level loops (while or for loops) and execution branches (switch, if, else) that can be critical in identifying how execution evolves over time.
We will focus primarily on the value that the flow control graph has for static analysis of programs of which we do not have a clear view (thus without source code). However, we should not think that the usefulness of the control flow graph is limited to analyzing a program. Compilers use the control flow graph to implement a number of optimizations involving expressions and variables. The CFG is the starting point for applying certain analyses, such as detecting the liveness of a variable or calculating when an expression is reused.
Constructing the CFG requires no special effort from the analyst since most analysis software already implements graph retrieval from elementary blocks. However, easy solutions are not part of this blog; in this article, we will see how it can be constructed and some examples to show how useful the control flow graph is within static analysis.
Formally, the Control Flow Graph is defined as a <V, E>
graph, where V represents the set of elementary blocks of a function and E represents the set of arcs connecting each elementary block according to the semantic representation of the instructions. Alternatively, we can say that the CFG is composed of a set of arcs <b1, b2>
, where b1
and b2
are elementary blocks, and that pair is inserted within the CFG if and only if the flow control from the elementary block b1
passes to the elementary block b2
.
For those who would like an intuitive definition of the basic structures we are using: an elementary block represents a series of sequential instructions that are grouped within a single structure called a block. Each elementary block cannot contain more than one flow change instruction. A sequence of instructions generates an elementary block if: at each position, one instruction “dominates” (dominate = is executed first) over the other sequential instructions and no other instruction is executed between two sequential instructions.
In other words, an elementary block is composed of a series of instructions such that only the first instruction can be a target of a block (single entry) and only the last instruction can lead to another block (single exit). Within the flow control graph, we can identify a start node that has no incoming arcs and an exit node that has no outgoing arcs.
Every static analysis software, including decompilers, once the program instructions have been disassembled and translated into intermediate representation, proceed to break down the instructions into elementary blocks. The elementary blocks constitute the nodes of the graph, while the arcs are the links between each elementary block. Given a block B, the blocks to which control is transferred are called successors of block B, while the blocks from which B obtains control are called predecessors.
To better cast the CFG in the context of the analysis, let us proceed to detail an example of flow control graph construction. Suppose we have the following code (x86 assembly):
fun1:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 2
mov DWORD PTR [rbp-8], 5
cmp DWORD PTR [rbp-4], 1
jle .L2
add DWORD PTR [rbp-4], 3
add DWORD PTR [rbp-8], 2
jmp .L3
.L2:
mov eax, DWORD PTR [rbp-8]
sub eax, 4
mov DWORD PTR [rbp-4], eax
add DWORD PTR [rbp-8], 5
.L3:
mov eax, 0
pop rbp
ret
fun1
is a function we call to set some values within memory addresses (particularly in the stack). We want to try to recover the original high-level code, using some static analysis techniques. To do this, we have to try to recover the control flow, asking whether the program has a linear execution or the CPU proceeds to evaluate some conditions.
We divide the following code into elementary blocks knowing that each elementary block can contain only one flow change instruction: we apply an algorithm that identifies those block start or block end instructions. The steps to be performed are as follows:
Generally, instructions that cause a basic block to terminate include:
panic()
or __stack_chk_fail
).Let us proceed to apply the algorithm for the set of instructions we have above. We identify the leaders:
fun1:
push rbp # leader (first instruction)
mov rbp, rsp
mov DWORD PTR [rbp-4], 2
mov DWORD PTR [rbp-8], 5
cmp DWORD PTR [rbp-4], 1
jle .L2
add DWORD PTR [rbp-4], 3 # leader (instruction follows a jump instruction)
add DWORD PTR [rbp-8], 2
jmp .L3
.L2:
mov eax, DWORD PTR [rbp-8] # leader (the instruction is pointed by jle .L2)
sub eax, 4
mov DWORD PTR [rbp-4], eax
add DWORD PTR [rbp-8], 5
.L3:
mov eax, 0 # leader (the instruction is pointed by jmp .L3)
pop rb
ret
We can thus identify the following elementary blocks:
The first one:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 2
mov DWORD PTR [rbp-8], 5
cmp DWORD PTR [rbp-4], 1
jle .L2
The second one:
add DWORD PTR [rbp-4], 3
add DWORD PTR [rbp-8], 2
jmp .L3
The third one:
mov eax, DWORD PTR [rbp-8]
sub eax, 4
mov DWORD PTR [rbp-4], eax
add DWORD PTR [rbp-8], 5
and then the fourth one:
mov eax, 0
pop rb
ret
The division was sequential: instruction grouping was done by going to evaluate each instruction taken individually and going to the next one each time. We can then add the connections between the elementary blocks. For simplicity, let us number the elementary blocks starting at index zero. We need to ask: taken an elementary block, where will the CPU jump to at runtime?
To summarize, we have the following links:
[ (b0 -> b2), (b0 -> b1), (b1 -> b3), (b2 -> b3) ]
This array forms our flow control graph and allows us to identify an initial high-level flow change construct. We use DOT notation to get a visual picture of the graph:
Actually this representation is not very useful however we can enrich it by going to include for each node (or elementary block) the assembly code.
With this graph, we can understand how the execution is split into two paths, there will then need to be an evaluation of a guard (i.e., a condition that when evaluated returns either a true value or a false value) to understand where the flow will jump. To identify this information, we proceed by taking the first elementary block. With the first elementary block, we can see that there is a cmp
instruction: in the instruction set architecture of the x86 family, cmp
is an instruction that is used to make the comparison between two elements. The result of the comparison is set inside the rflags
register (it contains all the flags).
cmp DWORD PTR [rbp-4], 1 -- jle .L2
is equivalent to the expression *(rbp-4) < 1?
This evaluation allows you to select the next elementary block that is considered by the CPU (remember, however, that the CPU jumps to memory addresses; the elementary block representation is only for ease of analysis). If the value contained within (rbp-4)
is equal to 1, then we will switch to execution of elementary block b2
; if not, we will switch to b1
. This condition can also be written within the arc connecting b0 to b1.
Let us translate each elementary block with a more C-like representation, remembering that the add
and sub
instructions are sum and subtraction operations, push rbp -- mov rbp, rsp
are the prologue of functions (we are not interested in them for now), while DWORD PTR [rbp-#X]
are memory locations where rbp - #X
is the address.
The flow control graph illustrated above is very simple to understand, with more complex graphs however, it might be useful to do some rewriting to try to arrive at a code that is as understandable as possible. First we might rewrite the variables involved, removing DWORD PTR [rbp-#X]
Let’s proceed in this way:
DWORD PTR [rbp-#X]
where #X
is an integer: we query a table in which we store that for the index #X
there is the variable named var_#X
and apply the transformation DWORD PTR [rbp-#X]
-> #X
[rbp-X]
. Sometimes eliminating this kind of information is very dangerous and could lead to transformations that are not semantically equivalent to the source code.We apply the rule for the first elementary block:
push rbp # ignore instruction: prologue
mov rbp, rsp # ignore instruction: prologue
mov DWORD PTR [rbp-4], 2 # becomes: mov var_4, 2
mov DWORD PTR [rbp-8], 5 # becomes: mov var_8, 2
cmp DWORD PTR [rbp-4], 1 # becomes: cmp var_8, 1
jle .L2 # ignore instruction
The final code turns out to be for the first elementary block:
mov var_4, 2
mov var_8, 5
cmp var_8, 1
jle .L2
For the second elementary block:
add var_4, 3
add var_8, 2
jmp .L3
For the third elementary block:
mov eax, var_4
sub eax, 4
mov var_4, eax
add var_8, 5
The fourth has no modification since it does not use any access to the address referenced by [rbp- #X]
. Some might ask whether it would not have made more sense to apply the transformation directly on the linear code instead of constructing the control flow graph: we will see later that graph construction allows us to apply these transformations very easily and in local contexts without worrying about disastrous effects on the rest of the instructions.
The Static Single Assignment form is a very powerful intermediate representation that allows some important information about the variation of variable values to be recovered. The form vine currently used in static analysis, decompilation, and within optimizing compilers. This transformation involves rewriting the terms mov first_parameter, second_parameter
as a single assignment first_parameter = second_parameter
.
With enough cunning and intuition, it is possible to write the previous assembly with the form of Static Single Assignment. To improve the output of the transformation we can also consider logical algebraic operations that are successive to a mov and have the first parameter in common. Let us proceed:
mov first_parameter, second_parameter
: we rewrite by making an assignment first_parameter = second_parameter
The final code turns out to be for the first elementary block:
var_4 = 2
var_8 = 5
cmp var_8, 1
jle .L2
For the second elementary block:
var_4 = var_4 + 3
var_8 = var_8 + 2
jmp .L3
For the third elementary block:
eax = var_4
eax = eax - 4
var_4 = eax
var_8 = var_8 + 5
For the fourth elementary block:
eax = 0
pop rb # we ignore this function: epilogue
We note that particularly for the third elementary block another transformation can be applied to improve the readability of the code. When we find any instruction that computes logical arithmetic operations following an assignment instruction and the first_parameter
is in common, then we place the logical operation within the first assignment.
The third elementary block becomes thus:
var_4 = var_4 - 4
var_8 = var_8 + 5
In the last transformation we try to recover the control flow of the code to write the code with the constructs we are used to seeing, particularly for this case if
and else
. The instructions that remain to be inspected are the cmp
and the conditional or unconditional jumps.
The compare instruction (cmp
) for x86 semantically is a subtraction instruction between the two operators to be compared. For example cmp %eax, 1
is equal to performing a subtraction operation between %eax
and 1. As a side effect, the subtraction operation sets some flag bits within the rflags
register: these flags allow the cpu to jump with conditional jumps (they specify the flag to be considered).
Let us focus our attention on the first elementary block so that we can understand how well the jump works. Let us analyze the first elementary block, which consists of a series of assignment instructions and:
cmp var_8, 1
jle .L2
The compare instruction performs subtraction between var_8
and 1, may seem like a choice that makes no sense, why is it necessary to perform subtraction? Could not the creators of Intel’s ISA directly add instructions such as cmpe
or cmple
? With respect to the sign of the result of the subtraction operation, we identify three different conditions:
var_8
is less than 1var_8
is greater than 1var_8
is equal to 1.You can combine the conditions of the SIGN flag and the ZERO flag to figure out whether a number is greater than or equal or less than or equal. The instruction block we showed earlier can be rewritten as if (var_8 <= 1)
, the conditional jump argument representing the address to jump to when the if condition is true (i.e., elementary block number #2). What happens, however, if the condition is false? If the condition is false, the CPU will continue to execute instructions sequentially: looking at the code in its entirety, the CPU will jump to the instructions contained in elementary block #1.
Returning to the instructions in the first elementary block, we write in natural language the transformation we want to apply.
cmp
and the next is a conditional jump, we write if (condition){ label }
where the condition consists of the operands of the cmp
operation and the operation is given by the type of the conditional jump (le
for Less Equal, g
for Greater…). We also add the elementary block number to jump to between two curly brackets.The two instructions then become if(var_8 <= 1) { b2 }
. Since the other elementary block (i.e., the false condition) is reachable only if (var_8 > 1)
we can use the else
construct to represent the false condition of the guard of the if statement. The final code is if(var_8 <= 1) { b2 } else { b1 }
.
A further transformation is the union of all the elementary blocks to try to reassemble the initial code from which we started. We must be careful to place the elementary blocks in the correct positions: for example, following the transformation of the if-else
branch we can place block number #2 inside the true
branch of the if and block number #1 inside the false
branch of the if. We add a final semicolon to make it more C-like.
var_4 = 2;
var_8 = 5;
if (var_8 <= 1) {
var_4 = var_4 - 4;
var_8 = var_8 + 5;
} else {
var_4 = var_4 + 3;
var_8 = var_8 + 2;
}
eax = 0;
However, it would be reductive to have the last instruction with a register: why indeed is the eax
register used to set it to 0, if it is not used afterwards? In fact, there is one detail that has been omitted and it concerns the main feature of this assembly code. The assembly code has been extrapolated from a function (you can identify whether an assembly instruction set is a function by identifying the prologue and epilogue!). Within Intel’s Instruction Set Architecture, it is specified how the final result of the function, i.e., the return value, is always contained within the eax
register. If the last instruction is eax = value
, we can replace the code with return value
. Ultimately, the code results:
int fun_1(){
var_4 = 2;
var_8 = 5;
if (var_8 <= 1) {
var_4 = var_4 - 4;
var_8 = var_8 + 5;
} else {
var_4 = var_4 + 3;
var_8 = var_8 + 2;
}
return 0;
}
This example has no particular significance at the level of code semantics. We can notice an important feature that was perhaps more hidden before concerning the values of the variables. Since we know the values of the variables a priori and since we know that the values are static, we can apply all the optimization techniques to go and simplify the instructions (the if condition will always be false!). This particular one was not so obvious to us because it is very difficult to understand low-level code: instead, the compiler could have through the division into elementary blocks noticed the always false condition by not including elementary block number 2 and eliminating the two execution branches.
The (trivial) example makes clear the importance of partitioning into elementary blocks and the application of local transformations within blocks. Every time we go to compile some source code, the compiler applies some optimization techniques just based on the flow control graph to go and eliminate dead code (unused code) or instruction shuffling. To go and add further value to the use of the flow control graph, a more complex example follows.
Let us look at a more complex example to analyze the flow control graph and obtain information with respect to loops. The goal is to reconstruct the original code and see if we can also derive what this particular function does with respect to one or more locally used parameters.
f2:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-12], 4
mov DWORD PTR [rbp-4], 0
mov DWORD PTR [rbp-8], 1
jmp .label_A
.label_B:
mov eax, DWORD PTR [rbp-8]
imul eax, DWORD PTR [rbp-4]
mov DWORD PTR [rbp-8], eax
add DWORD PTR [rbp-4], 1
.label_A:
mov eax, DWORD PTR [rbp-4]
cmp eax, DWORD PTR [rbp-12]
jl .label_B
mov eax, 0
pop rbp
ret
We apply the division into elementary blocks and construct the flow control graph. The first elementary block turns out to be:
push rbp # prologue
mov rbp, rsp # prologue
mov DWORD PTR [rbp-12], 4
mov DWORD PTR [rbp-4], 0
mov DWORD PTR [rbp-8], 1
jmp .label_A
The second elementary block turns out to be (label_B):
mov eax, DWORD PTR [rbp-8]
imul eax, DWORD PTR [rbp-4]
mov DWORD PTR [rbp-8], eax
add DWORD PTR [rbp-4], 1
The third elementary block (label_A):
mov eax, DWORD PTR [rbp-4]
cmp eax, DWORD PTR [rbp-12]
jl .label_B
And the fourth:
mov eax, 0
pop rbp
ret
Let us examine block by block to understand the connections between the elementary blocks:
To summarize, we have the following links:
[ (b0 -> b2), (b1 -> b2), (b2 -> b1), (b2 -> b3) ]
Applying DOT notation, we can represent the graph in this way:
This rapresentation does not contain many information, we can then enrich it including per each node (or basic block) the assembly code linked to that.
In this case, we are not sure how we can identify the flow control; however, we note with respect to the previous example that the graph presents a loop (given by b1 -> b2
and b2 -> b1
). This feature could indicate that the initial code contains a loop: a portion of code that is executed multiple times with the evaluation of a specific guard.
The next step is to apply the transformations we previously applied for the if-else
branch example. We will apply the transformations for variable rewriting, operation rewriting and flow control recovery.
Ignoring the prologue for now, the first elementary block becomes:
var_12 = 4
var_4 = 0
var_8 = 1
jmp .label_A
The second elementary block (label_B):
var_8 = var_8 * var_4
var_4 = var_4 + 1
The third elementary block (label_A):
if (var_4 < var_12) { goto label_B }
# we can write 'goto' as a jump
# cmp var_4, var_12
# jl .label_B
The fourth elementary block:
eax = 0
We note that the third elementary block (i.e., elementary block #2) is very special in that it is potentially reachable by both b0 and b1. We can imagine that CPU execution keeps jumping between elementary block b1 and elementary block b2. Therefore, we cannot think of joining the elementary blocks without taking this information into account. Intuitively we can think of the third elementary block as the output condition of the loop, while the second elementary block corresponds to the body of the loop (i.e., what is executed at each iteration).
This insight, however, cannot be left to chance. The software analyses that first described the flow control graph pointed to a very important notion called dominating block identification. A b_i
block is said to be dominant over a b_j
block if and only if within all the paths from the start node to the b_j
node there is also the b_i
block.
Formally, the dominating blocks of a block B are defined as:
dom(B) = {B} union ( intersection ( for each Y in predecessors(B) ) dom(Y) )
Where:
I can compute the dominator blocks by a simple algorithm that exploits a stack as a list to keep information about the elementary blocks yet to be computed.
dom(b0) = {b0}
where b0 is the initial node (this set never changes)dom(B) = {all elementary blocks}
initially.For this case:
dom(b0) = {b0}
dom(b1) = {b1, b2}
dom(b2) = {b0, b1, b2}
dom(b3) = {b0, b1, b2, b3}
We then identify natural loops: a loop is said to be natural when in the presence of a backward arc going from m to n, n dominates over m, it is a set of nodes x such that n dominates over x and there is a path going from x to m and does not contain m. The only backward arc for our graph is from elementary block b1
to elementary block b2
.
Rewriting the definition, we need to find a natural loop with a backward arc from b1 to b2 and we need to include within the natural loop a set of nodes x such that b2 dominates over x and there is a path from node x to node b1 and does not contain b1. We find that the only node x that meets this condition is node b2, header of the loop.
Ultimately, the algorithm that identifies natural loops is described as follows:
Now that we have recognized the loop with b1
as the loop body and b2
header block, we can proceed to rewrite the code by joining block b2 and block b1 between a loop while
.
while(var_4 < var_12){
var_8 = var_8 * var_4;
var_4 = var_4 + 1;
}
Does this while loop remind you of anything? As long as var_4 is less than var_12 I multiply var_8 with var_4: this may not be so visible at first glance however it represents the factorial function of a natural number. Specifically, the loop represents var_8 = fatt(var_12)
. By combining the other elementary blocks with some transformations, we can find the original form of the code:
var_12 = 4;
var_4 = 0;
var_8 = 1;
while(var_4 < var_12){
var_8 = var_8 * var_4;
var_4 = var_4 + 1;
}
return 0;
To be more precise, someone might notice that the assignment to var_4
, we can rewrite the while loop into a for loop. To identify the for loop, there must be an instruction within the loop block that increases or decreases the value of a variable used within the guard, and before the loop there must be initialization of the variable.
var_12 = 4;
var_8 = 1;
for(var_4 = 0; var_4 < var_12; var_4++){
var_8 = var_8 * var_4;
}
It is not always so obvious to turn a while loop into a for loop, however, modern static analysis software has some heuristics to check that within the body there are the classical operations of incrementing or decrementing a variable used to be able to access, for example, a structure or an array.
The final value of the factorial function depends on var_12
and again we have simplified the discussion by including a static value. As an exercise you can check how to optimize the code (by simply placing the value of fatt(4)
inside var_8).
For this example, it was fairly easy to retrieve the while loop since the code provided coincided with the idea and definition of natural loops (so we were able to distinguish the loop header and body without problems). Remember, however, that this is not always the case: irreducible loops are loops that allow for “breaking” loop detection algorithms. Irreducible loops involve multiple access points for the loop body; the algorithm we proposed would fail to decide which block to consider as the body. Several approaches have been developed in order to be able to solve the situation: one must take into account that in the presence of normal assembly code generated by a “classical” compiler we find natural loops for most cases.
Most algorithms are developed within a graph: decompilers and analyzers such as Angr, Binary Ninja, Ghidra, and IDA make constant use of software analysis algorithms applied to the flow control graph. The choice to use this data structure is because of the versatility with which any transformation can be constructed that can be applied within a local block without worrying about “side effects” outside the block being considered.
The flow control graph can be constructed within a single function or over the entire program: to avoid making reading too heavy, we will focus on constructing the flow control graph local to a function. To construct the flow control graph over the entire program, it is necessary to construct all flow controls locally and then traverse the graphs from the initial point, exploring all execution branches. This is computationally onerous since resource usage grows exponentially with the number of paths to traverse.
The flow control graph constructed locally to a function, while simple to manipulate, can sometimes generate unpredictable results depending on the complexity of the graph and the presence or absence of irreducible loops. Attackers who want to thwart any effort to obtain the original source code can construct flow control graphs with a large enough number of nodes to consume all of the analyst’s resources. There are then obfuscations such as control flow flattening to make any attempt to recover the original control flow useless.
In the next article, we will see how to build a flow control graph through the Ghidra API. As always, if you have any criticisms, suggestions or any other comments, please send me an email to seekbytes@protonmail.com.