- Our aim is to build a control flow graph and discover all loops in the program.
- Loops are difficult to see in the intermediate code because all structure has been lost.
- The control flow graph (CFG) is used for data flow analysis as well as other optimizations.

- We consider two approaches for control flow analysis:
- Using dominators to discover loops.
- Sufficient when using an iterative data flow analysis

- Using interval analysis.
- This includes a series of methods that analyze the overall structure of the routine and that decompose it into nested regions called intervals
- The nesting structure of the intervals forms a tree called a control tree, that is useful in structuring and speeding up data flow analysis.

- Using dominators to discover loops.
- The interval method is superior in three ways:
- It is faster at performing the actual data flow analysis.
- It makes it easier to update already computed data flow information in response to changes to a program.
- It is easier to perform control flow transformations.

- Nodes in the control flow graph are generally basic blocks.
- A
*basic block*is a maximal sequence of instructions that can be entered only at the first instruction and exited only from the last instruction. - This implies that once any instruction in the basic block is executed, they all have to be executed.
- This first instruction in the basic block can be:
- Entry point to a routine.
- A target of a branch.
- An instruction immediately following a branch or a return.

- The first instruction in a basic block is called a
*leader*. - To determine the basic blocks, first identify all leaders. Then, for each leader, include in its basic block all the instructions from the leader to the next leader or the end of the routine.
- Typically, call statements are not considered to be jumps. There are exceptions, but we won't worry about them.
- Given the basic blocks, we then construct the control flow graph by adding directed edges to the nodes (basic blocks) when control passes from one basic block to another.
- There are two distinguished nodes called
`entry`and`exit`. Control enters the control flow graph through the`entry`node and exits through the`exit`node. - The
`entry`and`exit`nodes make things easier because we have a unique entry point and a unique exit point from the graph. - As an example of the above technique, consider
Figures 3.1-3.3.

- An
*extended basic block*is a maximal sequence of instructions beginning with a leader that contains no join node, other than its first node. - The CFG for Figure 3.3 using extended basic blocks is shown
in Figure 3.4.

- The algorithm for extended basic blocks is fairly simple and is given
in the book.

- We consider four graph theoretic concepts that are important to several of the algorithms used below.
- They are:
- Depth first search.
- Preorder traversal.
- Postorder traversal.
- Breadth first search.

- A depth first search visits the descendants of a node in the graph before visiting any of its siblings that are not also its descendents.
- Consider the control flow graph in Figure 3.5(a) and its depth
first presentation shown in Figure 3.5(b).

- The algorithm is straightforward and is shown in the book.
- The
*depth first presentation*includes all the graph's nodes and the edges that make up the depth first order displayed as a tree (called a*depth first spanning tree*) and the other edges (ones not part of the depth first order) displayed in such a way as to distinguish them from the tree edges. - Edges that are not part of the depth first spanning tree are divided
into three classes:
*Forward edges*- edges that go from a node to a direct descendant in the spanning tree, but not along a tree edge.*Back edges*- edges that go from a node to one of its ancestors in the tree.*Cross edges*- edges that connect nodes that that neither is an ancestor of the other.

- The preorder and postorder traversals induce an order on the nodes of a graph.
- A
*preorder traversal*of the graph*G*is a traversal in which each node is processed before its descendants (disregarding back edges). For Figure 3.5(a), the preorder traversal is 1, 2, 3, 4, 5, 7, 8, 9, 10, 6. - A
*postorder traversal*of the graph*G*is a traversal in which each node is processed after its descendants (disregarding back edges). For Figure 3.5(a), the postorder traversal is 9, 10, 8, 7, 5, 6, 4, 3, 2, 1. - A
*breadth first search*visits the immediate descendants before any of the unvisited descendants. For Figure 3.5(a), the breadth first search is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. - The algorithm for breadth first search is simple and is given in the
book.

- Node
*d**dominates*node*i*, written , if every possible execution path (in the CFG) from`entry`to*i*includes*d*. - The relation
*dom*has the following properties:*Reflexive*- every node dominates itself.*Transitive*- if and , then .*Antisymmetric*- if and , then*b*=*a*.

- Node
*a**immediately dominates*node*b*, if , if and there does not exist a node*c*such that and for which and . - Obviously an immediate dominator is unique.
- Node
*p**postdominates*node*i*, written , if every possible execution path (in the CFG) from*i*to`exit`includes*p*. - One approach to finding dominators is that node
*a*dominates node*b*if and only if*a*=*b*, or*a*is the unique immediate predecessor of*b*, or*b*has more than one immediate predecessor and for all immediate predecessors*c*of*b*, and*a*dominates*c*. - The algorithm for this approach is shown in Figure 3.6.

- Note the line in the algorithm marked with *. If the nodes are considered in depth first order, this algorithm runs most efficiently. Why is this true?
- Applying the algorithm in Figure 3.6 to
Figure 3.5(a), we get the following results. Please note that
Domin(
*i*) contains the set of all nodes that dominate node*i*.**Node****Domin(***i*)1 {1} 2 {1, 2} 3 {1, 3} 4 {1, 3, 4} 5 {1, 3, 4, 5} 6 {1, 3, 4, 6} 7 {1, 3, 4, 7} 8 {1, 3, 4, 7, 8} 9 {1, 3, 4, 8, 9} 10 {1, 3, 4, 8, 10} - We are also shown an algorithm for finding immediate dominators, and that is left as an exercise for you.
- There is a faster, but more complicated method for finding dominators,
developed by Lengauer and Tarjan covered in the book, and you can look at it
for your own interest.

- A
*back edge*in a CFG is one whose*head*dominates its*tail*. Consider the edge ,*y*is the head and*x*is the tail. - Note that this definition of back edge is more restrictive than the one given in relationship to a depth first search.
- For example, consider the CFG shown in Figure 3.7(a) and the
depth first presentation in Figure 3.7(b).

The back edge in the figure is not a back edge as defined above. - The back edges in Figure 3.5(a) are:
**Back edges** - The
*natural loop*formed using the back edge contains those nodes that can reach*m*without going through*n*. - Node
*n*is the*loop header*. - The algorithm for finding the natural loop, given a back edge, is
shown in Figure 3.8.

- For the back edges given above, the natural loops for
Figure 3.5(a) are shown below.
**Back edge****Natural loop**{4, 5, 6, 7, 8, 10} {7, 8, 10} {3, 4, 5, 6, 7, 8, 10} {3, 4, 5, 6, 7, 8, 10} {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} - Many optimizations that we consider move code from inside the loop to just before the loop header.
- In order to have a place to put this code, we introduce a
*preheader*that is a block (initially empty) placed just before the loop such that all edges that previously went to the header from outside the loop now go to the preheader, and there is a single new edge from the preheader to the loop header. - Natural loops handle any loop that can occur when structured programming techniques are used.
- If goto's are used, then a more general technique is needed using
*strongly connected components*. This is discussed in the book and left as an exercise for you.

- There are transformations that can be applied to flow graphs in order to collapse the flow graphs into a single node.
- If a flow graph can be reduced to a single node, the flow graph is
said to be
*reducible*. - Any flow graph created using
*structured programming techniques*is reducible. - All loops in a reducible flow graph are natural loops.
- In a reducible flow graph there are no jumps into the middle of loops.
- There are control flow patterns that make flow graph irreducible. One such pattern is shown in Figure 3.7(a).
- Studies have been done of many real life programs, and even including languages that allow non-structured control, well over 90% of the programs are reducible.
- There are three practical approaches to dealing with irreducibility
in control flow graphs to data flow analysis:
- Perform iterative data flow analysis.
- Perform
*node splitting*that transforms irreducible regions into reducible one. An example of this is shown in Figure 3.9.

- Perform an induced iteration on the lattice of monotone functions from the lattice to itself.

- Interval analysis refers to dividing up the flow graph into regions of various sorts, consolidating each region into a new node, and replacing the edges entering of leaving the region with edges entering or leaving the corresponding abstract node.
- Since the transformations are applied either one at a time or to
disjoint subgraphs in parallel, the resulting regions are
*nested*in the sense that each abstract node corresponds to a subgraph. - The result of applying a sequence of transformations produces a
*control tree*, defined as follows:- The root of the control tree is an abstract graph representing the original flow graph.
- The leaves of the control tree are individual basic blocks.
- The nodes between the root and the leaves are abstract nodes representing regions of the flow graph.
- The edges of the tree represent the relationship between each abstract node and the regions that are descendants.

- One example of transformations is T1-T2 analysis.
- The T1 and T2 transformations are show in Figure 3.10.

- Consider the example control flow graph in Figure 3.11, the
sequence of T1 and T2 transformations, and the resulting control tree.

- Interval analysis can now be done using the following algorithm:
- 1.
- Perform a postorder traversal of the node set of the flow graph,
looping for loop headers and headers of improper regions
^{1.1}. - 2.
- For each loop header found, construct its natural loop and reduce it to an abstract region of type ``natural loop.''
- 3.
- For each set of entries of an improper region
^{1.2}, construct the minimal strongly connected component of the flow graph containing all the entries and reduce it to an abstract region of type ``improper region.'' - 4.
- For the
`entry`node and for each immediate descendant of a node in a natural loop or in an irreducible region, construct the maximal acyclic graph with that node as its root; if the resulting graph has more than one node in it, reduce it to an abstract region of type ``acyclic region.'' - 5.
- Iterate this process until it terminates.

- This type of analysis deals with reducing high level control structures into single nodes.
- There are more patterns that can be matched, instead of just two as in T1-T2 analysis. In fact, there is one for each high level control construct.
- Given this structure, it is possible to simplify data flow analysis.
- The rest of the details are left to the student.

- ... regions
^{1.1} - Not considered.
- ... region
^{1.2} - Not considered.