'Generating a call graph with clang's -dot-callgraph with multiple cpp files, and a sed command

I tried Doxygen, but it was a bit slow, and it generated a lot of irrelevant individual dot files, so I'm pursuing the clang way to generate a call graph.

This answer https://stackoverflow.com/a/5373814/414063 posted this command:

$ clang++ -S -emit-llvm main1.cpp -o - | opt -analyze -dot-callgraph
$ dot -Tpng -ocallgraph.png callgraph.dot

and then

$ clang++ -S -emit-llvm main1.cpp -o - |
   opt -analyze -std-link-opts -dot-callgraph
$ cat callgraph.dot | 
   c++filt | 
   sed 's,>,\\>,g; s,-\\>,->,g; s,<,\\<,g' | 
   gawk '/external node/{id=$1} $1 != id' | 
   dot -Tpng -ocallgraph.png

I managed to get the .dot files and unmangle them with c++filt, but the symbols are made of a lot of "noise", example:

"{__gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::deallocate(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, unsigned long)}"
"{void std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >::destroy<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >(std::allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >*)}"
"{void __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::destroy<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >(std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >*)}"

How does doxygen manage to "simplify" those symbols? Is there anything else other than STLfilt?

How can I properly filter symbols that are not relevant to my code, like allocators, constructors for containers? What does this sed and gawk command attempt to do? I tried them but I could not really see what they did.

c++clang++call-graph

Solution 1:^[1]

I managed to do it, but it was not really trivial, and clang doesn't really provide options to filter "noisy" symbols.

It is important to know that graphviz cannot magically optimize the layout of a graph, so it's better to generate one graph per object file.

Here is the python filter I came up with to remove a lot of the noise. There are various things that are not part of std::, like sfml or nlohmann (a heavy templated json library that will generate a lot of symbols). I did not use regex, as it was not really relevant. Those filters should vary a lot depending on your code, the library you use, and eventually what parts of the standard library you use, since "it's just templates all the way down".

def filtered(s):
    return not (
        s.startswith("& std::__get_helper")
        or s.startswith("__cx")
        or s.startswith("__gnu_cxx::")
        or s.startswith("bool nlohmann::")
        or s.startswith("bool std::")
        or s.startswith("decltype")
        or s.startswith("int* std::")
        or s.startswith("int** std::")
        or s.startswith("nlohmann::")
        or s.startswith("sf::")
        or s.startswith("std::")
        or s.startswith("void __gnu_cxx::")
        or s.startswith("void format<")
        or s.startswith("void nlohmann::")
        or s.startswith("void std::")
        or 'std::remove_reference' in s
        or 'nlohmann::' in s
        or '__gnu_cxx::' in s
        or 'std::__copy_move' in s
        or 'std::__niter' in s
        or 'std::__miter' in s
        or 'std::__get_helper' in s
        or 'std::__uninitialized' in s
        or 'sf::operator' in s
        or s == 'sf::Vector2<float>::Vector2()'
        or s == 'sf::Vector2<float>::Vector2(float, float)'
        or s == 'sf::Vector2<float>::Vector2<int>(sf::Vector2<int> const&)'
        or s == 'sf::Vector2<int>::Vector2()'
        or s == 'sf::Vector2<int>::Vector2(int, int)'
        )

Second, I also removed symbols that were not called, and calls to node absent from the object file. Concretely, I just cross checked nodes and edges in the generated DOT file

# filtering symbols I don't want
nodes_filtered = [(name, label) for (name, label) in nodes if filtered(label)]

# using a set() for further cross checking
nodes_filt_ids = set([name for (name, label) in nodes_filtered])

# we only keep edges with symbols (source and destination) BOTH present in the list of nodes
edge_filtered = [(a,b) for (a,b) in edges if a in nodes_filt_ids and b in nodes_filt_ids]

# we then build a set() of all the nodes from the list of edges
nodes_from_filtered_edges =  set(sum(edge_filtered, ()))

# we then REFILTER AGAIN from the list of filtered edges
nodes_refiltered = [(name, label) for (name, label)
    in nodes_filtered if name in nodes_from_filtered_edges]

Third, I used a makefile to cascade steps.

    object_file.ll: object_file.cpp 2dhelpers.h.gch
        $(CC) -S -emit-llvm $< -o $@ $(args) $(inclflags)
    object_file.ll.callgraph.dot: object_file.ll
        opt $< -std-link-opts -dot-callgraph
    object_file.cxxfilt.dot: object_file.ll.callgraph.dot
        cat $< | llvm-cxxfilt > $@
    object_file.cleaned.dot: object_file.cxxfilt.dot
        python3 dot-parse.py $^
    object_file.final.svg: object_file.cleaned.dot
        dot -Tsvg $^ -o $@

Important to note that llvm-cxxfilt unmangled symbols a bit better than c++filt, although it was not perfect in my case.

The DOT file written by opt is quite trivial to parse, but you can still use pydot, although I found it to be a little slow, which it will if you have a lot of symbols (I have 2500 symbols, 5000 calls, reduce to 116 and 151)

If you really want to "combine" multiple object files and use a single graph, you absolutely can, using llvm-link

# linking llvm IR with this magic
all_objects.ll: file1.ll file2.ll file3.ll file4.ll file5.ll
    llvm-link -S $^ -o all_objects.ll

Just apply command #2 to #5 to the resulting .ll file

In my case the single big graph was not easy to read, as multiple edges would cross and form "rivers". Adding color did not help a lot.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	jokoon

'Generating a call graph with clang's -dot-callgraph with multiple cpp files, and a sed command

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]