The Role of the Control Flow Graph in Static Analysis

October 30, 2023 – 25 min – 5184 words

The control-flow graph is an important building block in static program analysis for applying a variety of analyses that consider the flow of a program. The “flow”, described informally, represents the evolution of the program during execution time, that is, at which the CPU jumps in order to continue program execution.

The graph (also called CFG for brevity) allows the first high-level elements of software to be derived in a general way from a low- or medium-level representation (we cite as an example assembly code or intermediate code). High-level elements include high-level loops (while or for loops) and execution branches (switch, if, else) that can be critical in identifying how execution evolves over time.

We will focus primarily on the value that the control-flow graph has for static analysis of programs of which we do not have a clear view (thus without source code). However, we should not think that the usefulness of the control flow graph is limited to analyzing a program. Compilers use the control flow graph to implement a number of optimizations involving expressions and variables. The CFG is the starting point for applying certain analyses, such as detecting the liveness of a variable or calculating when an expression is reused.

Constructing the CFG requires no special effort from the analyst since most analysis software already implements graph retrieval from elementary blocks. However, easy solutions are not part of this blog; in this article, we will see how it can be constructed and some examples to show how useful the control flow graph is within static analysis.

The control flow graph

Formally, the Control Flow Graph is defined as a <V, E> graph, where V represents the set of elementary blocks of a function and E represents the set of arcs connecting each elementary block according to the semantic representation of the instructions. Alternatively, we can say that the CFG is composed of a set of arcs <b1, b2> , where b1 and b2 are elementary blocks, and that pair is inserted within the CFG if and only if the flow control from the elementary block b1 passes to the elementary block b2.

For those who would like an intuitive definition of the basic structures we are using: an elementary block represents a series of sequential instructions that are grouped within a single structure called a block. Each elementary block cannot contain more than one flow change instruction. A sequence of instructions generates an elementary block if: at each position, one instruction “dominates” (dominate = is executed first) over the other sequential instructions and no other instruction is executed between two sequential instructions.

In other words, an elementary block is composed of a series of instructions such that only the first instruction can be a target of a block (single entry) and only the last instruction can lead to another block (single exit). Within the control-flow graph, we can identify a start node that has no incoming arcs and an exit node that has no outgoing arcs.

Every static analysis software, including decompilers, once the program instructions have been disassembled and translated into intermediate representation, proceed to break down the instructions into elementary blocks. The elementary blocks constitute the nodes of the graph, while the arcs are the links between each elementary block. Given a block B, the blocks to which control is transferred are called successors of block B, while the blocks from which B obtains control are called predecessors.

If-else branch case

To better cast the CFG in the context of the analysis, let us proceed to detail an example of control-flow graph construction. Suppose we have the following code (x86 assembly):

fun1:
	push    rbp
	mov     rbp, rsp
	mov     DWORD PTR [rbp-4], 2
	mov     DWORD PTR [rbp-8], 5
	cmp     DWORD PTR [rbp-4], 1
	jle     .L2
	add     DWORD PTR [rbp-4], 3
	add     DWORD PTR [rbp-8], 2
	jmp     .L3
.L2:
    mov     eax, DWORD PTR [rbp-8]
    sub     eax, 4
    mov     DWORD PTR [rbp-4], eax
    add     DWORD PTR [rbp-8], 5
.L3:
    mov     eax, 0
    pop     rbp
    ret

fun1 is a function we call to set some values within memory addresses (particularly in the stack). We want to try to recover the original high-level code, using some static analysis techniques. To do this, we have to try to recover the control flow, asking whether the program has a linear execution or the CPU proceeds to evaluate some conditions.

We divide the following code into elementary blocks knowing that each elementary block can contain only one flow change instruction: we apply an algorithm that identifies those block start or block end instructions. The steps to be performed are as follows:

Identify the “leading” instructions in the code, that is, instructions that can be of these 3 categories:
1. The first instruction is a leader.
2. The instruction pointed to by a conditional or unconditional flow change (goto/jump) instruction is a leader.
3. The instruction that immediately follows a flow change instruction is leader.
Starting with a leader, group the next set of instructions up to the next excluded leader: that grouping constitutes the basic block corresponding to the initial leader.

Generally, instructions that cause a basic block to terminate include:

conditional and unconditional branches, both direct and indirect;
return instructions
instructions that can generate an exception.
calls to functions that do not make a return (such as panic() or __stack_chk_fail).

Let us proceed to apply the algorithm for the set of instructions we have above. We identify the leaders:

fun1:
	push    rbp 					# leader (first instruction)
	mov     rbp, rsp 				
	mov     DWORD PTR [rbp-4], 2 	
	mov     DWORD PTR [rbp-8], 5 	
	cmp     DWORD PTR [rbp-4], 1 	
	jle     .L2						
	add     DWORD PTR [rbp-4], 3 	# leader (instruction follows a jump instruction)
	add     DWORD PTR [rbp-8], 2 	
	jmp     .L3 					
.L2:
    mov     eax, DWORD PTR [rbp-8]	# leader (the instruction is pointed by jle .L2)
    sub     eax, 4
    mov     DWORD PTR [rbp-4], eax
    add     DWORD PTR [rbp-8], 5
.L3:
    mov     eax, 0 					# leader (the instruction is pointed by jmp .L3)
    pop     rb
    ret

We can thus identify the following elementary blocks:

The first one:

push    rbp
mov     rbp, rsp 				
mov     DWORD PTR [rbp-4], 2 	
mov     DWORD PTR [rbp-8], 5 	
cmp     DWORD PTR [rbp-4], 1 	
jle     .L2

The second one:

add     DWORD PTR [rbp-4], 3 
add     DWORD PTR [rbp-8], 2 	
jmp     .L3

The third one:

mov     eax, DWORD PTR [rbp-8]	
sub     eax, 4
mov     DWORD PTR [rbp-4], eax
add     DWORD PTR [rbp-8], 5

and then the fourth one:

mov     eax, 0
pop     rb
ret

The division was sequential: instruction grouping was done by going to evaluate each instruction taken individually and going to the next one each time. We can then add the connections between the elementary blocks. For simplicity, let us number the elementary blocks starting at index zero. We need to ask: taken an elementary block, where will the CPU jump to at runtime?

for the first elementary block we have two connections: the first goes from elementary block 0 to elementary block 1 (because execution is sequential, so flow control goes to the next block), the second goes from elementary block 0 to elementary block 2 (there is a conditional jump to the L2 label).
for the second elementary block we have only one link: in fact, we notice that the execution necessarily jumps to label L3 (block number 3), so there is a link from elementary block 1 to elementary block 3 (unconditional jump).
for the third elementary block we have only one link: within the block there are no jump instructions, let alone function call instructions. We can therefore say that control goes from elementary block 2 to elementary block 3.
the fourth elementary block is the final block so we have no links starting from this block.

To summarize, we have the following links:

[ (b0 -> b2), (b0 -> b1), (b1 -> b3), (b2 -> b3) ]

This array forms our control-flow graph and allows us to identify an initial high-level flow change construct. We use DOT notation to get a visual picture of the graph:

Control flow graph without description

Actually this representation is not very useful however we can enrich it by going to include for each node (or elementary block) the assembly code.

Control flow graph with description

With this graph, we can understand how the execution is split into two paths, there will then need to be an evaluation of a guard (i.e., a condition that when evaluated returns either a true value or a false value) to understand where the flow will jump. To identify this information, we proceed by taking the first elementary block. With the first elementary block, we can see that there is a cmp instruction: in the instruction set architecture of the x86 family, cmp is an instruction that is used to make the comparison between two elements. The result of the comparison is set inside the rflags register (it contains all the flags).

cmp DWORD PTR [rbp-4], 1 -- jle .L2 is equivalent to the expression *(rbp-4) < 1? This evaluation allows you to select the next elementary block that is considered by the CPU (remember, however, that the CPU jumps to memory addresses; the elementary block representation is only for ease of analysis). If the value contained within (rbp-4) is equal to 1, then we will switch to execution of elementary block b2; if not, we will switch to b1. This condition can also be written within the arc connecting b0 to b1.

Let us translate each elementary block with a more C-like representation, remembering that the add and sub instructions are sum and subtraction operations, push rbp -- mov rbp, rsp are the prologue of functions (we are not interested in them for now), while DWORD PTR [rbp-#X] are memory locations where rbp - #X is the address.

Rewriting variables

The control-flow graph illustrated above is very simple to understand, with more complex graphs however, it might be useful to do some rewriting to try to arrive at a code that is as understandable as possible. First we might rewrite the variables involved, removing DWORD PTR [rbp-#X] Let’s proceed in this way:

We Iterate on each elementary block
We Iterate on the instructions of each elementary block
If we find an instruction and one of the two parameters matches DWORD PTR [rbp-#X] where #X is an integer: we query a table in which we store that for the index #X there is the variable named var_#X and apply the transformation DWORD PTR [rbp-#X] -> #X
In all other cases: we leave the code unchanged.

This transformation simplifies the code, but assumes that the analyst does not care about the pointer value of [rbp-X]. Sometimes eliminating this kind of information is very dangerous and could lead to transformations that are not semantically equivalent to the source code.

We apply the rule for the first elementary block:

push    rbp 					# ignore instruction: prologue
mov     rbp, rsp 				# ignore instruction: prologue
mov     DWORD PTR [rbp-4], 2 	# becomes: mov var_4, 2
mov     DWORD PTR [rbp-8], 5 	# becomes: mov var_8, 2
cmp     DWORD PTR [rbp-4], 1 	# becomes: cmp var_8, 1
jle     .L2						# ignore instruction

The final code turns out to be for the first elementary block:

mov     var_4, 2 	
mov     var_8, 5 	
cmp     var_8, 1 	
jle     .L2

For the second elementary block:

add     var_4, 3 
add     var_8, 2 	
jmp     .L3

For the third elementary block:

mov     eax, var_4	
sub     eax, 4
mov     var_4, eax
add     var_8, 5

The fourth has no modification since it does not use any access to the address referenced by [rbp- #X]. Some might ask whether it would not have made more sense to apply the transformation directly on the linear code instead of constructing the control flow graph: we will see later that graph construction allows us to apply these transformations very easily and in local contexts without worrying about disastrous effects on the rest of the instructions.

Rewriting operations through SSA

The Static Single Assignment form is a very powerful intermediate representation that allows some important information about the variation of variable values to be recovered. The form vine currently used in static analysis, decompilation, and within optimizing compilers. This transformation involves rewriting the terms mov first_parameter, second_parameter as a single assignment first_parameter = second_parameter.

With enough cunning and intuition, it is possible to write the previous assembly with the form of Static Single Assignment. To improve the output of the transformation we can also consider logical algebraic operations that are successive to a mov and have the first parameter in common. Let us proceed:

We Iterate on the elementary blocks:
1. We Iterate on the instructions
2. When we find any instruction mov first_parameter, second_parameter: we rewrite by making an assignment first_parameter = second_parameter
3. When we find any instruction that computes logical arithmetic operations, we rewrite the instruction by making an assignment with the logical arithmetic operation.

The final code turns out to be for the first elementary block:

var_4 = 2 	
var_8 = 5 	
cmp     var_8, 1 	
jle     .L2

For the second elementary block:

var_4 = var_4 + 3
var_8 = var_8 + 2
jmp     .L3

For the third elementary block:

eax = var_4
eax = eax - 4
var_4 = eax
var_8 = var_8 + 5

For the fourth elementary block:

eax = 0
pop rb # we ignore this function: epilogue

We note that particularly for the third elementary block another transformation can be applied to improve the readability of the code. When we find any instruction that computes logical arithmetic operations following an assignment instruction and the first_parameter is in common, then we place the logical operation within the first assignment.

The third elementary block becomes thus:

var_4 = var_4 - 4
var_8 = var_8 + 5

Flow control retrieval

In the last transformation we try to recover the control flow of the code to write the code with the constructs we are used to seeing, particularly for this case if and else. The instructions that remain to be inspected are the cmp and the conditional or unconditional jumps.

The compare instruction (cmp) for x86 semantically is a subtraction instruction between the two operators to be compared. For example cmp %eax, 1 is equal to performing a subtraction operation between %eax and 1. As a side effect, the subtraction operation sets some flag bits within the rflags register: these flags allow the cpu to jump with conditional jumps (they specify the flag to be considered).

Let us focus our attention on the first elementary block so that we can understand how well the jump works. Let us analyze the first elementary block, which consists of a series of assignment instructions and:

cmp     var_8, 1 	
jle     .L2

The compare instruction performs subtraction between var_8 and 1, may seem like a choice that makes no sense, why is it necessary to perform subtraction? Could not the creators of Intel’s ISA directly add instructions such as cmpe or cmple? With respect to the sign of the result of the subtraction operation, we identify three different conditions:

if the result is of negative sign (SIGN flag at NG - negative), then var_8 is less than 1
if the result is of positive sign (flag SIGN a PL - positive), then var_8 is greater than 1
if the result is zero (ZERO flag at ZR, otherwise NZ), then var_8 is equal to 1.

You can combine the conditions of the SIGN flag and the ZERO flag to figure out whether a number is greater than or equal or less than or equal. The instruction block we showed earlier can be rewritten as if (var_8 <= 1), the conditional jump argument representing the address to jump to when the if condition is true (i.e., elementary block number #2). What happens, however, if the condition is false? If the condition is false, the CPU will continue to execute instructions sequentially: looking at the code in its entirety, the CPU will jump to the instructions contained in elementary block #1.

Returning to the instructions in the first elementary block, we write in natural language the transformation we want to apply.

We Iterate on the elementary blocks
We Iterate on instructions
If the instruction we are evaluating is a cmp and the next is a conditional jump, we write if (condition){ label } where the condition consists of the operands of the cmp operation and the operation is given by the type of the conditional jump (le for Less Equal, g for Greater…). We also add the elementary block number to jump to between two curly brackets.

The two instructions then become if(var_8 <= 1) { b2 }. Since the other elementary block (i.e., the false condition) is reachable only if (var_8 > 1) we can use the else construct to represent the false condition of the guard of the if statement. The final code is if(var_8 <= 1) { b2 } else { b1 }.

A further transformation is the union of all the elementary blocks to try to reassemble the initial code from which we started. We must be careful to place the elementary blocks in the correct positions: for example, following the transformation of the if-else branch we can place block number #2 inside the true branch of the if and block number #1 inside the false branch of the if. We add a final semicolon to make it more C-like.

var_4 = 2; 	
var_8 = 5;
if (var_8 <= 1) {
	var_4 = var_4 - 4;
	var_8 = var_8 + 5;
} else {
	var_4 = var_4 + 3;
	var_8 = var_8 + 2;
}
eax = 0;

However, it would be reductive to have the last instruction with a register: why indeed is the eax register used to set it to 0, if it is not used afterwards? In fact, there is one detail that has been omitted and it concerns the main feature of this assembly code. The assembly code has been extrapolated from a function (you can identify whether an assembly instruction set is a function by identifying the prologue and epilogue!). Within Intel’s Instruction Set Architecture, it is specified how the final result of the function, i.e., the return value, is always contained within the eax register. If the last instruction is eax = value, we can replace the code with return value. Ultimately, the code results:

int fun_1(){
	var_4 = 2; 	
	var_8 = 5;
	if (var_8 <= 1) {
		var_4 = var_4 - 4;
		var_8 = var_8 + 5;
	} else {
		var_4 = var_4 + 3;
		var_8 = var_8 + 2;
	}
	return 0;
}

This example has no particular significance at the level of code semantics. We can notice an important feature that was perhaps more hidden before concerning the values of the variables. Since we know the values of the variables a priori and since we know that the values are static, we can apply all the optimization techniques to go and simplify the instructions (the if condition will always be false!). This particular one was not so obvious to us because it is very difficult to understand low-level code: instead, the compiler could have through the division into elementary blocks noticed the always false condition by not including elementary block number 2 and eliminating the two execution branches.

The (trivial) example makes clear the importance of partitioning into elementary blocks and the application of local transformations within blocks. Every time we go to compile some source code, the compiler applies some optimization techniques just based on the control-flow graph to go and eliminate dead code (unused code) or instruction shuffling. To go and add further value to the use of the control-flow graph, a more complex example follows.

The while loop and the for loop

Let us look at a more complex example to analyze the control-flow graph and obtain information with respect to loops. The goal is to reconstruct the original code and see if we can also derive what this particular function does with respect to one or more locally used parameters.

f2:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-12], 4
        mov     DWORD PTR [rbp-4], 0
        mov     DWORD PTR [rbp-8], 1
        jmp     .label_A
.label_B:
        mov     eax, DWORD PTR [rbp-8]
        imul    eax, DWORD PTR [rbp-4]
        mov     DWORD PTR [rbp-8], eax
        add     DWORD PTR [rbp-4], 1
.label_A:
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-12]
        jl      .label_B
        mov     eax, 0
        pop     rbp
        ret

We apply the division into elementary blocks and construct the control-flow graph. The first elementary block turns out to be:

push    rbp 					# prologue
mov     rbp, rsp 				# prologue
mov     DWORD PTR [rbp-12], 4
mov     DWORD PTR [rbp-4], 0
mov     DWORD PTR [rbp-8], 1
jmp     .label_A

The second elementary block turns out to be (label_B):

mov     eax, DWORD PTR [rbp-8]
imul    eax, DWORD PTR [rbp-4]
mov     DWORD PTR [rbp-8], eax
add     DWORD PTR [rbp-4], 1

The third elementary block (label_A):

mov     eax, DWORD PTR [rbp-4]
cmp     eax, DWORD PTR [rbp-12]
jl      .label_B

And the fourth:

mov     eax, 0
pop     rbp
ret

Let us examine block by block to understand the connections between the elementary blocks:

for block number 0: we have only one unconditional jump to block number 2 (with label_A).
for block number 1 (label_B): we have no jump instruction, we continue with execution with block number 2 (label_A)
for block number 2 (label_A): we have a jump instruction conditional on block number 1 (label_B), otherwise we continue linearly with block number 3.
for block number 3: execution ends with ret. We have no connections to other blocks.

To summarize, we have the following links:

[ (b0 -> b2), (b1 -> b2), (b2 -> b1), (b2 -> b3) ]

Applying DOT notation, we can represent the graph in this way:

Control flow graph cycle while-for without description

This rapresentation does not contain many information, we can then enrich it including per each node (or basic block) the assembly code linked to that.

Control flow graph cycle while-for with description

In this case, we are not sure how we can identify the flow control; however, we note with respect to the previous example that the graph presents a loop (given by b1 -> b2 and b2 -> b1). This feature could indicate that the initial code contains a loop: a portion of code that is executed multiple times with the evaluation of a specific guard.

The next step is to apply the transformations we previously applied for the if-else branch example. We will apply the transformations for variable rewriting, operation rewriting and flow control recovery.

Ignoring the prologue for now, the first elementary block becomes:

var_12 = 4
var_4 = 0
var_8 = 1
jmp     .label_A

The second elementary block (label_B):

var_8 = var_8 * var_4
var_4 = var_4 + 1

The third elementary block (label_A):

if (var_4 < var_12) { goto label_B } 
# we can write 'goto' as a jump
# cmp     var_4, var_12
# jl      .label_B

The fourth elementary block:

eax = 0

We note that the third elementary block (i.e., elementary block #2) is very special in that it is potentially reachable by both b0 and b1. We can imagine that CPU execution keeps jumping between elementary block b1 and elementary block b2. Therefore, we cannot think of joining the elementary blocks without taking this information into account. Intuitively we can think of the third elementary block as the output condition of the loop, while the second elementary block corresponds to the body of the loop (i.e., what is executed at each iteration).

Dominant blocking

This insight, however, cannot be left to chance. The software analyses that first described the control-flow graph pointed to a very important notion called dominating block identification. A b_i block is said to be dominant over a b_j block if and only if within all the paths from the start node to the b_j node there is also the b_i block.

Formally, the dominating blocks of a block B are defined as:

dom(B) = {B} union ( intersection ( for each Y in predecessors(B) ) dom(Y) )

Where:

A block B dominates itself;
A node that dominates over all predecessors of B also dominates over B;
I take the intersection between each predecessor node and their dominators;

I can compute the dominator blocks by a simple algorithm that exploits a stack as a list to keep information about the elementary blocks yet to be computed.

dom(b0) = {b0} where b0 is the initial node (this set never changes)
For all nodes except b0, set dom(B) = {all elementary blocks} initially.
Push every node except b0 to a list.
Now remove any node, B, from the list.
1. Calculate a new value for dom(B) using the previous formula.
2. If the new value of dom(B) differs from the current value, use the new value.
3. Add all successors of B to the working list (if not already in the list, except for b0 whose value is known).
Repeat until the working list is empty.

For this case:

dom(b0) = {b0}
dom(b1) = {b1, b2}
dom(b2) = {b0, b1, b2}
dom(b3) = {b0, b1, b2, b3}

We then identify natural loops: a loop is said to be natural when in the presence of a backward arc going from m to n, n dominates over m, it is a set of nodes x such that n dominates over x and there is a path going from x to m and does not contain m. The only backward arc for our graph is from elementary block b1 to elementary block b2.

Rewriting the definition, we need to find a natural loop with a backward arc from b1 to b2 and we need to include within the natural loop a set of nodes x such that b2 dominates over x and there is a path from node x to node b1 and does not contain b1. We find that the only node x that meets this condition is node b2, header of the loop.

Ultimately, the algorithm that identifies natural loops is described as follows:

Identify the dominators.
Find the backward arcs: the targets of the backward arcs are potentially the loop headers.
Verify that for each backward arc there is a path from the elementary block from which the arc starts to the loop header block.

Now that we have recognized the loop with b1 as the loop body and b2 header block, we can proceed to rewrite the code by joining block b2 and block b1 between a loop while.

while(var_4 < var_12){
	var_8 = var_8 * var_4;
	var_4 = var_4 + 1;
}

Does this while loop remind you of anything? As long as var_4 is less than var_12 I multiply var_8 with var_4: this may not be so visible at first glance however it represents the factorial function of a natural number. Specifically, the loop represents var_8 = fatt(var_12). By combining the other elementary blocks with some transformations, we can find the original form of the code:

var_12 = 4;
var_4 = 0;
var_8 = 1;
while(var_4 < var_12){
	var_8 = var_8 * var_4;
	var_4 = var_4 + 1;
}
return 0;

To be more precise, someone might notice that the assignment to var_4, we can rewrite the while loop into a for loop. To identify the for loop, there must be an instruction within the loop block that increases or decreases the value of a variable used within the guard, and before the loop there must be initialization of the variable.

var_12 = 4;
var_8 = 1;
for(var_4 = 0; var_4 < var_12; var_4++){
	var_8 = var_8 * var_4;
}

It is not always so obvious to turn a while loop into a for loop, however, modern static analysis software has some heuristics to check that within the body there are the classical operations of incrementing or decrementing a variable used to be able to access, for example, a structure or an array.

The final value of the factorial function depends on var_12 and again we have simplified the discussion by including a static value. As an exercise you can check how to optimize the code (by simply placing the value of fatt(4) inside var_8).

For this example, it was fairly easy to retrieve the while loop since the code provided coincided with the idea and definition of natural loops (so we were able to distinguish the loop header and body without problems). Remember, however, that this is not always the case: irreducible loops are loops that allow for “breaking” loop detection algorithms. Irreducible loops involve multiple access points for the loop body; the algorithm we proposed would fail to decide which block to consider as the body. Several approaches have been developed in order to be able to solve the situation: one must take into account that in the presence of normal assembly code generated by a “classical” compiler we find natural loops for most cases.

The control flow graph is the starting point for static analysis

Most algorithms are developed within a graph: decompilers and analyzers such as Angr, Binary Ninja, Ghidra, and IDA make constant use of software analysis algorithms applied to the control-flow graph. The choice to use this data structure is because of the versatility with which any transformation. That can be applied within a local block without worrying about “side effects” outside the block being considered.

The control-flow graph can be built within a single function or over the entire program: to avoid making reading too heavy, we will focus on constructing the control-flow graph local to a function. To construct the control-flow graph over the entire program, it is necessary to construct all flow controls locally and then traverse the graphs from the initial point, exploring all execution branches. This is computationally onerous since resource usage grows exponentially with the number of paths to traverse.

The control-flow graph constructed locally to a function, while simple to manipulate, can sometimes generate unpredictable results depending on the complexity of the graph and the presence or absence of irreducible loops. Attackers who want to thwart any effort to obtain the original source code can construct control-flow graphs with a large enough number of nodes to consume all of the analyst’s resources. There are then obfuscations such as control flow flattening to make any attempt to recover the original control flow useless.

In the next article, we will see how to build a control-flow graph through the Ghidra API. As always, if you have any criticisms, suggestions or any other comments, please send me an email to seekbytes@protonmail.com.