A Function object is a constructor from LLVM pass - llvm

I am writing a llvm pass. I have a Function object from llvm and I want to know if this one is a constructor of a class or not. How can I do it? Any suggestion?

Well... Looks like the only thing you could do is to try to demangle the function name and try to deduce whether it's constructor or not. Though, it's not 100% precise since one would need to handle multiple mangling schemes.

As Anton said, we can demangle the function name.
LLVM has a demangler in llvm/Demangle/Demangle.h.
The usage can be seen in
how to get function base name from managled function name using llvm ItaniumDemangle api
ItaniumPartialDemangler Demangler;
Demangler.partialDemangle(F.getName().str().c_str());
if (Demangler.isCtorOrDtor()){
outs() << MF.getName() << " is constructor !\n";
}
This may not be an elegant way to determine whether a function is a constructor or not as it may cause performance issues in compilation.
I am still waiting for a better approach.

Related

How do I get LLVM types from Clang?

I'm looking to write a tool using Clang. The details are fairly immaterial, but what I'm looking to do is get an llvm type from Clang. For example, I'd like to go from "printf" to llvm::Function*, and "size_t" to a llvm::Type*. But I can't find any functions in Clang that give out these functions. I've decided that I can ask Clang to mangle the names, and then ask the llvm::Module* for the data- but I can't find how to get an llvm::Module* that corresponds to a Clang invocation.
How can I get the internal LLVM data from Clang?
Ultimately, the code generation APIs are not part of Clang's public API. This is for good reason, because they are awful.
However, you can create a clang::CodeGen::CodeGenModule, which you can use to codegen a given TU into a provided llvm::Module. Then, you can get the mangled name of a symbol by using the getMangledName function on the CodeGenModule.
However, do not attempt to use the provided functions for converting from a clang::QualType to an llvm::Type*- they are unusably broken. The only viable strategy I have found for reliably performing this conversion is to find a function with a signature, for example, a member function, and then query the Module for the type of that parameter, for example, this. But this is pretty ABI-specific and a nasty hack. You can also compute the LLVM type name that Clang generates for a given type and search for it in the module, but this is not always successful.
In general, you cannot. clang internals are made in layers as well. So, you cannot simply grab a piece of AST and say 'Hey, give me a function'. AST is converted to LLVM IR at the IR generation step module at a time. This way we can be sure everything is parsed and semantically correct and complete.
So, if you really need all this sort of thing, then you need to hook into clang really late, after IR generation and try to operate on loosely coupled AST and LLVM IR at that time.

Is there a portable wrapper for C++ type_info that standardizes type name string format?

The format of the output of type_info::name() is implementation specific.
namespace N { struct A; }
const N::A *a;
typeid(a).name(); // returns e.g. "const struct N::A" but compiler-specific
Has anyone written a wrapper that returns dependable, predictable type information that is the same across compilers. Multiple templated functions would allow user to get specific information about a type. So I might be able to use:
MyTypeInfo::name(a); // returns "const struct N::A *"
MyTypeInfo::base(a); // returns "A"
MyTypeInfo::pointer(a); // returns "*"
MyTypeInfo::nameSpace(a); // returns "N"
MyTypeInfo::cv(a); // returns "const"
These functions are just examples, someone with better knowledge of the C++ type system could probably design a better API. The one I'm interested in in base(). All functions would raise an exception if RTTI was disabled or an unsupported compiler was detected.
This seems like the sort of thing that Boost might implement, but I can't find it in there anywhere. Is there a portable library that does this?
There are some limitations to do such things in C++, so you probably won't find exactly what you want in the near future. The meta-information about the types that the compiler inserts in the compiled code is also implementation-specific to the RTL used by the compiler, so it'd be difficult for a third-party library to do a good job without relying to undocumented features of each specific compiler that might break in later versions.
The Qt framework has, to my knowledge, the nearest thing to what you intended. But they do that completely independent from RTTI. Instead, they have their own "compiler" that parses the source code and generates additional source modules with the meta-information. Then, you compile+link these modules along with your program and use their API to get the information. Take a look at http://doc.qt.nokia.com/latest/metaobjects.html
Jeremy Pack (from Boost Extension plugin framework) appears to have written such a thing:
http://blog.redshoelace.com/2009/06/resource-management-across-dll.html
3. RTTI does not always function as expected across DLL boundaries. Check out the type_info classes to see how I deal with that.
So you could have a look there.
PS. I remembered because I once fixed a bug in that area; this might still add information so here's the link: https://stackoverflow.com/a/5838527/85371
GCC has __cxa_demangle https://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html
If there are such extensions for all compilers you target, you could use them to write a portable function with macros to detect the compiler.

Is there any way to replace a function in a library?

I work with a library which defines its internal division operator for a scripting language. Unfortunately it does not zero-check the divisor. Which leads to lot of headaches. I know the signature of the operator.
double ScriptClass::Divide(double&, double&);
Sadly it isn't even a C function. Is there any way I could make my application use my own Divide function instead of ScriptClass::Divide function?
EDIT:
I was aware of dlopen(NULL,..) and replacing "C" functions with user defined ones. Can this be done for class member functions (Without resorting to using mangled names)?
Various linkers and dynamic linker implementations will provide something that looks like a solution to this, as others have mentioned.
However, if you redefine one C++ function using any of those features (GNU ld's --wrap, ld.so's LD_PRELOAD, etc.), you are violating the one-definition rule and are thus invoking undefined behaviour.
While compiling your library, the compiler is allowed to inline the function in question in any way that it sees fit, which means that your redefinition of the function might not be invoked in all cases.
Consider the following code:
class A
{
public:
void foo();
void bar();
};
void A::foo()
{
std::cout << "Old version.\n";
}
void A::bar()
{
foo();
}
GCC 4.5, when invoked with -O3, will actually decide to inline the definition of foo() into bar(). If you somehow made your linker replace this definition of A::foo() with a definition of your own, A::bar() would still output the string "Old version.\n".
So, in a word: don't.
Generally speaking it's up to the programmer, not the underlying divide operator to prevent division by zero. If you're dividing by zero a lot that seems to indicate a possible flaw in the algorithm being used. Consider reworking the algorithm, or if that's not an option, guard calls to divide with a zero check. You could even do that inside a protected_divide type function.
All that being said, assuming that since it looks like a C++ function you have a C++ library compiled with all the same options you're using to build your application so name mangling matches you might be able to redefine the function into a .so and use LD_PRELOAD to force it to load. If you link statically, I think you can create the function into your own .o file and linking that prior to the library itself will cause the linker to pick up your version.
LD_PRELOAD is your friend. As an example, see:
https://web.archive.org/web/20090130063728/http://ibm.com/developerworks/linux/library/l-glibc.html
There's no getting away from the mangled names, I don't think, but you can use ld's --wrap option to cause a particular function to be given a new name based on its old name. You can then write a new version of it, and forward to the old version too if you like.
Quick overview here:
http://linux.die.net/man/1/ld
I've used this in the past to hook into malloc (etc.) without having to recompile the runtime library, though this wasn't on Linux (it was an embedded thing with no runtime loading). I didn't use it to wrap C++ functions, but if you can handle the C++ calling convention somehow, and you can create a function with the original function's mangled name, and get the compiler to accept a call to a function that has some ugly name with funny chars in it... I don't see why it shouldn't be possible to make it work.
Just short Q,
Cant you just wrap the class with your own code?
It'll be some headache at the start but after than you can simplify a lot of functions.
(Or even just wrap the function with a macro)

In C++ how is function overloading typically implemented?

If there is no function overloading, the function name serves as the address of the function code, and when a function is being called, its address is easy to find using its name. However with function overloading, how exactly can the program find the correct function address? Is there a hidden table similar to virtual tables that stores the overloaded functions with their address? Thanks a lot!
Name mangling.
It's all done at compile time. The C++ compiler actually modifies the function names you give it internally, so that a function like
int foo(int a, float b, char c)
internally gets a name equivalent to
func_foo_int_float_char()
(the real symbol is usually some gobbledygook like ?CFoo#Foo##QAAX_N#Z ).
As you can see, the name is decorated depending on the exact number and types of parameters passed. So, when you call a function, it's easy for the compiler to look at the parameters you are passing, decorate the function name with them, and come up with the correct symbol. For example,
int a, b; float f; char c;
foo(a,f,c) ; // compiler looks for an internal symbol called func_foo_int_float_char
foo(a,b,c) ; // compiler looks for a symbol called func_foo_int_int_char
Again, it's all done completely at compile time.
The compiler can look at the call, and match that against the known existing overloaded implementations, and pick the right one. No need for a dynamic table, it's all perfectly doable statically at compile-time.
Update: removed my attempt at illustrating the concept by showing differently-named functions that the compiler can choose between.
If you are talking about overloaded methods of the same class, like so:
void function(int n);
void function(char *s);
...
objectInstance->function("Hello World")
It is a compile time thingy. The compiler knows (or in some situations, makes a best guess) at this point which method to call.
A comment I made in the question, I repeat here.
People who suggest name mangling are misguided I think. It is not as if the compiler mangles the name and just does a lookup among the mangled names. It needs to infer the proper types from the available methods. Once it does that, it already knows which method to call. It then uses the mangled name as the last step. Name mangling is not a prerequisite for determining which overloaded function to call.
Overloaded functions are resolved at compile-time. The compiler finds a suitable match for the given set of parameters and simply calls the corresponding function by its address (void foo(int) and void foo() are practically two totally independent functions - if you have foo(4) in your code, the compiler knows which function to call).
It is, I believe, achieved through name mangling:
the functions you know as foo(int) and foo(double) are actually named something like int_foo() and double_foo() (or similar, I'm not entirely sure of the particular semantics employed for C++). This means that C++ symbols are usually an order of magnitude larger than the names they are given in code.
C++ compilers use name mangling (different name for each overload) to distinguish between the functions in the object file. For example
int test(int a){}
int test(float a,float b){}
int test(double a){}
int testbam(double a){}
would produce the symbol names __Z4testi, __Z4testff, __Z4testd, __Z7testbamd. This name mangling is highly compiler-dependent (sadly) and one of many reasons why often C is preferred over C++.
When calling the function test, the compiler matches the given argument types and number of arguments against each function overload. The function prototypes are then used to find out which one should be called.
The function signature is composed of the function name + parameter(s) type(s)
Even if no function overload, compilers usually mangle function and variable names. It is called name mangling. It happens in both C and C++. Function name can be decorated by notably (1) calling convention, (2) C++ function overloading, (3) class member function.
GNU binutil c++filt can undecorate this mangled name, and in Windows, there is UnDecorateSymbolName

Can I ungarble GCC's RTTI names?

Using gcc, when I ask for an object/variable's type using typeid, I get a different result from the type_info::name method from what I'd expect to get on Windows. I Googled around a bit, and found out that RTTI names are implementation-specific.
Problem is, I want to get a type's name as it would be returned on Windows. Is there an easy way to do this?
If it's what you're asking, there is no compiler switch that would make gcc behave like msvc regarding the name returned by type_info::name().
However, in your code you can rely on the gcc specific __cxa_demangle function.
There is in fact an answer on SO that addresses your problem.
Reference: libstdc++ manual, Chapter 40. Demangling.
c++ function names really include all the return and argument type information as well as the class and method name. When compiled, they are 'mangled' into a standard form (standard for each compiler) that can act as an assembler symbol and includes all the type information.
You need to run a function or program to reverse this mangling, called a demangler.
try running
c++filt myoutput.txt
on the output of the function. This demangles the real symbol name back into a human readable form.
Based on this other question Is there an online name demangler for C++? I've written a online tool for this: c++filtjs