Can I ungarble GCC's RTTI names? - c++

Using gcc, when I ask for an object/variable's type using typeid, I get a different result from the type_info::name method from what I'd expect to get on Windows. I Googled around a bit, and found out that RTTI names are implementation-specific.
Problem is, I want to get a type's name as it would be returned on Windows. Is there an easy way to do this?

If it's what you're asking, there is no compiler switch that would make gcc behave like msvc regarding the name returned by type_info::name().
However, in your code you can rely on the gcc specific __cxa_demangle function.
There is in fact an answer on SO that addresses your problem.
Reference: libstdc++ manual, Chapter 40. Demangling.

c++ function names really include all the return and argument type information as well as the class and method name. When compiled, they are 'mangled' into a standard form (standard for each compiler) that can act as an assembler symbol and includes all the type information.
You need to run a function or program to reverse this mangling, called a demangler.
try running
c++filt myoutput.txt
on the output of the function. This demangles the real symbol name back into a human readable form.

Based on this other question Is there an online name demangler for C++? I've written a online tool for this: c++filtjs

Related

How do I get LLVM types from Clang?

I'm looking to write a tool using Clang. The details are fairly immaterial, but what I'm looking to do is get an llvm type from Clang. For example, I'd like to go from "printf" to llvm::Function*, and "size_t" to a llvm::Type*. But I can't find any functions in Clang that give out these functions. I've decided that I can ask Clang to mangle the names, and then ask the llvm::Module* for the data- but I can't find how to get an llvm::Module* that corresponds to a Clang invocation.
How can I get the internal LLVM data from Clang?
Ultimately, the code generation APIs are not part of Clang's public API. This is for good reason, because they are awful.
However, you can create a clang::CodeGen::CodeGenModule, which you can use to codegen a given TU into a provided llvm::Module. Then, you can get the mangled name of a symbol by using the getMangledName function on the CodeGenModule.
However, do not attempt to use the provided functions for converting from a clang::QualType to an llvm::Type*- they are unusably broken. The only viable strategy I have found for reliably performing this conversion is to find a function with a signature, for example, a member function, and then query the Module for the type of that parameter, for example, this. But this is pretty ABI-specific and a nasty hack. You can also compute the LLVM type name that Clang generates for a given type and search for it in the module, but this is not always successful.
In general, you cannot. clang internals are made in layers as well. So, you cannot simply grab a piece of AST and say 'Hey, give me a function'. AST is converted to LLVM IR at the IR generation step module at a time. This way we can be sure everything is parsed and semantically correct and complete.
So, if you really need all this sort of thing, then you need to hook into clang really late, after IR generation and try to operate on loosely coupled AST and LLVM IR at that time.

Is there a portable wrapper for C++ type_info that standardizes type name string format?

The format of the output of type_info::name() is implementation specific.
namespace N { struct A; }
const N::A *a;
typeid(a).name(); // returns e.g. "const struct N::A" but compiler-specific
Has anyone written a wrapper that returns dependable, predictable type information that is the same across compilers. Multiple templated functions would allow user to get specific information about a type. So I might be able to use:
MyTypeInfo::name(a); // returns "const struct N::A *"
MyTypeInfo::base(a); // returns "A"
MyTypeInfo::pointer(a); // returns "*"
MyTypeInfo::nameSpace(a); // returns "N"
MyTypeInfo::cv(a); // returns "const"
These functions are just examples, someone with better knowledge of the C++ type system could probably design a better API. The one I'm interested in in base(). All functions would raise an exception if RTTI was disabled or an unsupported compiler was detected.
This seems like the sort of thing that Boost might implement, but I can't find it in there anywhere. Is there a portable library that does this?
There are some limitations to do such things in C++, so you probably won't find exactly what you want in the near future. The meta-information about the types that the compiler inserts in the compiled code is also implementation-specific to the RTL used by the compiler, so it'd be difficult for a third-party library to do a good job without relying to undocumented features of each specific compiler that might break in later versions.
The Qt framework has, to my knowledge, the nearest thing to what you intended. But they do that completely independent from RTTI. Instead, they have their own "compiler" that parses the source code and generates additional source modules with the meta-information. Then, you compile+link these modules along with your program and use their API to get the information. Take a look at http://doc.qt.nokia.com/latest/metaobjects.html
Jeremy Pack (from Boost Extension plugin framework) appears to have written such a thing:
http://blog.redshoelace.com/2009/06/resource-management-across-dll.html
3. RTTI does not always function as expected across DLL boundaries. Check out the type_info classes to see how I deal with that.
So you could have a look there.
PS. I remembered because I once fixed a bug in that area; this might still add information so here's the link: https://stackoverflow.com/a/5838527/85371
GCC has __cxa_demangle https://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html
If there are such extensions for all compilers you target, you could use them to write a portable function with macros to detect the compiler.

C++ class function in assembly

Hello Community
I am look at C++ assembly, I have compiled a benchmark from the PARSEC suite and I am having difficulty knowing how do they name the class attribute functions in assembly language. for example if I have a class with some functions to manipulate it, in cpp we call them like test.increment();
After some investigation I found out that this function is
atomic_load_acq_ptr
represented as:
_ZL19atomic_load_acq_intPVj
in assembly, or at least this is what I have found out.
Let me know if I am wrong!
Is there some fixed rule for the mapping? or are they random?
Thanks
It's called name mangling, is necessary because of overloads and templates and such (i.e. the plain chars-and-numbers name isn't enough to identify a chunk of code unambiguously; embedding spaces or <> or :: in names usually isn't legal; copying the additional information in uncondensed, human-readable form would be wasteful), and it therefore depends on types, arity, etc.
The exact scheme can vary, but usually each compiler is self-consistent for a relatively long time (sometimes even several compilers can settle for one way).
That's called name mangling.. It is compiler dependant. No standard way, sorry :)
C++ allows function overloading, this means that one can have two functions with the same name but different parameters. Since your binary formats do not understand type this is a proble. The way that this is worked around is to use a scheme called name mangling. This adds a whole function of type information to the name used in the source file and ensures one calls the correct overload.
The extra letters etc that are added are governed by the particular Application Binary Interface (ABI) being used. Different compilers (and sometimes even different versions) may use different ABIs.
Yes there's a standard method for creating these symbols known as name mangling.

How can I get nm to show the return types of a function?

I'm trying to write a script to produce a 'fake' version of a huge and messy code library, I thought using 'nm' on the binary and filtering just the text symbols might be the way to go, but I can't seem to get nm to display the return type of the function as well as the signature.
Many thanks in advance.
The return type of the function is not part of the name mangling. Return types are enforced by the compiler directly based on type rules.
It is possible to call a function defined as, for example, returning int, and having a declaration for it returning, say char. Most tools will not notice the mismatch. Considering all they ways you might shoot yourself in the foot, this isn't too bad since you would have to have gone out of your way to do it. Like by not using a header file common to both modules.

g++ generated Assembly looks ugly

I'm quite familiar with gcc assembly... Recently I was forced to use g++ for some code cleanup. Let me mention I'm very familiar with assembly, hence out of curiosity I often take a look at how good the compiler generated asm is.
But the naming conventions with g++ are just bizarre. I was wondering if there are any guidelines on how to read its asm output ?
Thanks a lot.
I don't find g++'s asm 'ugly' or hard to understand, though I've been working with GCC for over 8 years now.
On Linux, function labels usually go by _ZN, The "_ZN" prefix being a token that designates C++ name mangling (as opposed to C), followed by namespace the function belongs, then function names and argument types, then templates, if any.
Example:
// tests::vec4::testEquality()
_ZN5tests4vec412testEqualityEv
_ZN - C++ mangling, 'N' for member (_ZZ for const or others)
5tests - length (5 chars) + name
4vec4 -length (4 chars) + sub namespace
12testEquality - length (12 chars) + function name
Ev - void argument (none)
From man g++:
-fverbose-asm
Put extra commentary information in the generated assembly code to make it more
readable. This option is generally only of use to those who actually need to read the
generated assembly code (perhaps while debugging the compiler itself).
If you're looking at the naming convention for external symbols then this will follow the name mangling convention of the platform that you are using. It can be reversed with the c++filt program which will give you the human readable version of C++ function names, although they will (in all probability) no longer be valid linker symbols.
If you're just looking at local function labels, then you're out of luck. g++'s assembler output is for talking to the assembler and not really designed for ease of human comprehension. It's going to generate a set of relatively meaningless labels.
If the code has debugging information, objdump can provide a more helpful disassembly :
-S, --source Intermix source code with disassembly
-l, --line-numbers Include line numbers and filenames in output
For people who are working on demangling those names inside the program (like me), hopefully this thread helps.
def demangle(name):
import subprocess as sp
stdout, _ = sp.Popen(['c++filt', name],
stdin=sp.PIPE, stdout=sp.PIPE).communicate()
return stdout.split("\n")[0]
print demangle('_ZNSt15basic_stringbufIcSt11char_traitsIcESaIcEE17_M_stringbuf_initESt13_Ios_Openmode')