'How can I write an LLVM pass to detect redundant conditions in C / C + +

I want to write an LLVM pass to detect redundant condition pattern like this in c++.

int a = ... , b = ..., c = ...
//first if condition
if(a == b + 1 - c){
    ...//compoundStmt1
}
// not change the val of a, b ,c
...
//second if condition which is equivalent to first one
if(c == b - a + 1){ // redundant condition 
     ... //compoundStmt2
}

If I am able to find the equivalent condition, I may merge compoundStmt1 and CompoundStmt2.(That's my target!)

For this case, my idea is to find all conditional statements in C + + on CFG by checking whether the last sentence is a conditional branch jump instruction and the condition is ICMP or FCMP instruction, and add them to subsequent nodes, add the conditional constraints of the current node, and continue to spread to successors.

But I thought later that for the second if statement, it would actually be added a == b + 1 - c and a != b + 1 - c, which are actually equivalent to not adding them. How should I deal with it and judge that the second encounter of c == b - a + 1 is repeated with the first one before to find out this redundant condition case.

====================

for complicated case(such as string's "==" operator call), how can I do something to check that.

string s = "...."
//first if stmt
if(s == "abc"){
   ...
}
// not reassign s
...
...
//second if stmt with same condition 
if("abc" == s){ // redundant condition
   ...
}


Solution 1:[1]

In fact the first case is much harder to do:

a == b + 1 - c

c == b - a + 1

In LLVM IR the right side would have only two operands for arithmetic operations. So for simplicity let's assume we need to establish equivalence between:

a == b - c

c == b - a

In LLVM this could look like:

%a1 = sub %b, %c
%cmp1 = ceq %a, a1

%c1 = sub %b, %a
%cmp2 = ceq %c, c1

And the goal is to establish that %cmp1 and %cmp2 are identical. You can think of the operations in terms of DAG (Directed Acyclic Graphs) or Trees in most cases e.g.,

      cmp1(ceq)
     /         \
    %a1(sub)    %a
   /  \
  %b  %c


      cmp2(ceq)
     /         \
    %c1(sub)    %c
   /  \
  %b  %a

To establish equivalence you need to make a set of tree transforms that makes both trees identical. There are algorithms to establish Tree equivalence and, unsurprisingly, first proposed by the authors of the Dragon Book. For simple cases like small trees (<=5 nodes) and limited operands(arithmetic), you can just choose some heuristics to find equivalence. In case the sub-graph can't be reduced to a Tree and you are stuck with a DAG, it would be wiser to bail out in the interest of compile time as DAG isomorphism isn't cheap to establish

Even after equivalence is established, you need to check for certain invariants like dominance, non-intersecting side-effects etc. I have described few things below for the second case.

For the second case

The problem is there is no built in operator to compare strings. So the equality will look like a function call and it becomes a non-trivial problem then. But assuming you only want to compare scalars like int. e.g.,

s == 10
// s is not modified
10 == s

In LLVM IR this may look like:

%t1 = 10
%c1 = ceq %s, %t1 # ceq = compare equal
// s is not modified.

%t2 = 10
%c2 = ceq %t2, %s

The first step is to make sure that s is not modified.

In some cases you may get this for 'free' in SSA representation. SSA is a declarative representation, so each variable is defined once.

The second step is to do 'constant propagation'.

So %t2 will removed after that.

%t1 = 10
%c1 = ceq %s, %t1
// s is not modified.
%c2 = ceq %t1, %s

Third step is to canonicalize the operands.

You may sort the operands in some order e.g., lexicographic order

%t1 = 10
%c1 = ceq %s, %t1
// s is not modified.
%c2 = ceq %s, %t1

Fourth step is to 'hash' the instructions.

With hash you can find the identical operations. Compilers use 'GVN' as hashing mechanism. GVN also accounts for ordering as described in step 3. So with GVN you can skip the third step.

Fifth step is to establish 'dominance'.

Essentially, all paths from the beginning of the function to %c2 must pass through %c1. Then only one can replace %c2 with %c1. There are APIs which makes this check trivial.

In case we want to handle the original problem of detecting that the string comparisons are equivalent, we need to hard-code the semantics of strcmp. For example, knowing that strcmp(a, b) and strcmp(b, a) will return the same value (0) when a == b. And then performing 'canonicalization of operations followed by remaining steps described above; it is tricky but doable.

References: GVN-PRE is an optimization in LLVM that does what I described for the second case. There might be more checks needed for anticipability, and non-intersecting side effects, depending on the operand+operation types. What I described here is a very simple example.

GVNHoist is another optimization that has similar checks you can find useful.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1