'How to convert a Control Flow Graph back into its source code? e.g., C/C++
I have already generated the corresponding CFG of my source code. Then, I want to modify the CFG (merge some nodes). In then end, I need to convert the modified CFG back into a corresponding source code. How could I do that? I am using LLVM at this point.
Solution 1:[1]
This is a deceptively difficult problem. Usually the CFG is simplified so that the original source code cannot be reconstructed. For example, while and for-loops are replaced with tests and conditional jump instructions. I'm sure that is the case for LLVM:s intermediate form so you are out of luck there.
However, if the CFG has not been simplified it is possible to recreate the program's abstract syntax tree (AST). To do that you need to have the basic blocks of the CFG in program order. This is not difficult since that is the natural order you get when creating the CFG. You also need to precompute every basic block's immediate dominator. You can find an algorithm for computing that in the article A Simple, Fast Dominance Algorithm. Then the algorithm proceeds as follows:
- For every block except for the first one:
- Check whether this block is the direct child of its immediate
dominator. I.e, whether this block is the first block in a
while, for, or if-else clause.
- If yes, add the block to its immediate dominator's children.
- If no, mark this block as its immediate dominator's follower.
- Check whether this block is the direct child of its immediate
dominator. I.e, whether this block is the first block in a
while, for, or if-else clause.
- Use the collected data to build an AST as follows:
- Traverse the follower data from the first block in the CFG until
the last block which has no follower.
- For every block that is a for, if, or while, recurse from step 3 but start with the first and second child (in case of else) of the block.
Here is some Python code that implements the algorithm:
from ast import *
class BasicBlock:
def __init__(self, insns):
self.insns = insns
def type(self):
return type(self.insns[-1]) if self.insns else None
def build_tree(bb, foll, children):
nodes = []
while bb:
if not bb.insns:
break
last_insn = bb.insns[-1]
childs = children[bb]
tp = bb.type()
if tp in (For, If, While):
last_insn.body = \
build_tree(childs[0], foll, children)
if len(childs) == 2:
last_insn.orelse = \
build_tree(childs[1], foll, children)
nodes.extend(bb.insns)
bb = foll.get(bb)
return nodes
def is_dom_child(bb, dom_bb, succ):
childs = succ[dom_bb]
if bb in childs:
tp = dom_bb.type()
if tp in (If, For, While) and childs[0] == bb:
return True
if (tp == If and bb == childs[1] and
not reachable(succ, childs[0], childs[1], {dom_bb})):
return True
return False
def bbs_to_ast(bbs, succ, idoms):
foll = {}
children = defaultdict(list)
for bb in bbs[1:]:
dom_bb = idoms[bb]
if is_dom_child(bb, dom_bb, succ):
children[dom_bb].append(bb)
else:
foll[dom_bb] = bb
nodes = build_tree(bbs[0], foll, children)
return Module(nodes, [])
def print_bbs(bbs, succ, idoms):
mod = bbs_to_ast(bbs, succ, idoms)
return unparse(mod)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Björn Lindqvist |