llvm- Defining function in cpp and creating a call - llvm

In the llvm project tutorials, they usually have a Skeleton file in which an external function is called, while it's body is implemented in a c file whose .bc will be linked to have the resulting bitcode to find the external function.
However, looking at the implemented LLVM projects in github, I do not see them using any c file and linking it to the resulting bitcode.
My question is how I can define a function and create a call to the function. Is defining intrinsic functions the only way?
When defining a function in cpp, and having a createCall to the function, it does not find definition of the function defined in cpp when running the bitcode/or the binary.

I'm not sure if I understand your question right, but will try to answer.
When you do Function* myF = module->getOrInsertFunction("myF", ...); you just create a declaration for it. Much like void myF(...); in C/C++ header file.
To turn myF into a definition, you create BasicBlocks, populate them with Instructions and then insert these BasicBlocks into myF. This will make myF defined in your module and you won't see "definition not found" error anymore.

Related

How does MSVC++ CRT include files depending on the main function type?

When you compile a program using MSVC and Microsoft's CRT you'll notice it auto-magically figures out where your main function is, what the prototype is and calls into it.
This doesn't have much to do with calling conventions (as posted before in a "similar" question) since you can override the default calling convention AND on Windows x64 the only calling convention is __fastcall.
With that being said, how does Microsoft's CRT implementation figure out which main function you've declared and then includes a header file with definitions such as _SCRT_STARTUP_MAIN, _SCRT_STARTUP_WMAIN, _SCRT_STARTUP_WINMAIN, _SCRT_STARTUP_WWINMAIN, etc etc?
For example, if you make a function like so and compile as an exe: int __cdecl main()... the linker will somehow do the steps as follows:
Somehow figure out which function you're using, no matter the function prototype. As long as it's one of the valid main prototypes, it'll work regardless of the arguments or return types.
Then it will somehow use the file exe_main.cpp which has a definition _SCRT_STARTUP_MAIN defined right before including exe_common.inl.
Finally, inside exe_common.inl there's a check for the macro _SCRT_STARTUP_MAIN and will finally invoke the correct main function (once again, magically, without having issues with multiple definitions).
Is there any way whatsoever to mimic this behavior in my own project so I can provide any type of main function and auto-magically figure out what type of main function is being used?

Extract references/Compile a single function in C/C++

I recently got into the situation where i had access to a huge code base, which i couldn't build and i needed to test couple of its functions.
Nevertheless, those functions had references of functions/variables found in other files so it is a big mess trying to extract them manually.
Is there a way to do this automatically?
For example, i want to test function foo in test.c but foo depends on bar function found in file test2.c. The bar function could then be dependent on booz which is found in test3.c.
So in the case above, one could gather foo, bar, and booz in one file and compile.
To generate cross references you can use Doxygen.
You can have a look for PostgreSQL C source code here and search for example definition of main function.

C/C++ header and implementation files: How do they work?

This is probably a stupid question, but I've searched for quite a while now here and on the web and couldn't come up with a clear answer (did my due diligence googling).
So I'm new to programming... My question is, how does the main function know about function definitions (implementations) in a different file?
ex. Say I have 3 files
main.cpp
myfunction.cpp
myfunction.hpp
//main.cpp
#include "myfunction.hpp"
int main() {
int A = myfunction( 12 );
...
}
-
//myfunction.cpp
#include "myfunction.hpp"
int myfunction( int x ) {
return x * x;
}
-
//myfunction.hpp
int myfunction( int x );
-
I get how the preprocessor includes the header code, but how do the header and main function even know the function definition exists, much less utilize it?
I apologize if this isn't clear or I'm vastly mistaken about something, new here
The header file declares functions/classes - i.e. tells the compiler when it is compiling a .cpp file what functions/classes are available.
The .cpp file defines those functions - i.e. the compiler compiles the code and therefore produces the actual machine code to perform those actions that are declared in the corresponding .hpp file.
In your example, main.cpp includes a .hpp file. The preprocessor replaces the #include with the contents of the .hpp file. This file tells the compiler that the function myfunction is defined elsewhere and it takes one parameter (an int) and returns an int.
So when you compile main.cpp into object file (.o extension) it makes a note in that file that it requires the function myfunction. When you compile myfunction.cpp into an object file, the object file has a note in it that it has the definition for myfunction.
Then when you come to linking the two object files together into an executable, the linker ties the ends up - i.e. main.o uses myfunction as defined in myfunction.o.
You have to understand that compilation is a 2-steps operations, from a user point of view.
1st Step : Object compilation
During this step, your *.c files are individually compiled into separate object files. It means that when main.cpp is compiled, it doesn't know anything about your myfunction.cpp. The only thing that he knows is that you declare that a function with this signature : int myfunction( int x ) exists in an other object file.
Compiler will keep a reference of this call and include it directly in the object file. Object file will contain a "I have to call myfunction with an int and it will return to me with an int. It keeps an index of all extern calls in order to be able to link with other afterwards.
2nd Step : Linking
During this step, the linker will take a look at all those indexes of your object files and will try to solve dependencies within those files. If one is not there, you'll get the famous undefined symbol XXX from it. He will then translate those references into real memory address in a result file : either a binary or a library.
And then, you can begin to ask how is this possible to do that with gigantic program like an Office Suite, which have tons of methods & objects ? Well, they use the shared library mechanism. You know them with your '.dll' and/or '.so' files you have on your Unix/Windows workstation. It allows to postpone solving of undefined symbol until the program is run.
It even allows to solve undefined symbol on demand, with dl* functions.
1. The principle
When you write:
int A = myfunction(12);
This is translated to:
int A = #call(myfunction, 12);
where #call can be seen as a dictionary look-up. And if you think about the dictionary analogy, you can certainly know about a word (smogashboard ?) before knowing its definition. All you need is that, at runtime, the definition be in the dictionary.
2. A point on ABI
How does this #call work ? Because of the ABI. The ABI is a way that describes many things, and among those how to perform a call to a given function (depending on its parameters). The call contract is simple: it simply says where each of the function arguments can be found (some will be in the processor's registers, some others on the stack).
Therefore, #call actually does:
#push 12, reg0
#invoke myfunction
And the function definition knows that its first argument (x) is located in reg0.
3. But I though dictionaries were for dynamic languages ?
And you are right, to an extent. Dynamic languages are typically implemented with a hash table for symbol lookup that is dynamically populated.
For C++, the compiler will transform a translation unit (roughly speaking, a preprocessed source file) into an object (.o or .obj in general). Each object contains a table of the symbols it references but for which the definition is not known:
.undefined
[0]: myfunction
Then the linker will bring together the objects and reconciliate the symbols. There are two kinds of symbols at this point:
those which are within the library, and can be referenced through an offset (the final address is still unknown)
those which are outside the library, and whose address is completely unknown until runtime.
Both can be treated in the same fashion.
.dynamic
[0]: myfunction at <undefined-address>
And then the code will reference the look-up entry:
#invoke .dynamic[0]
When the library is loaded (DLL_Open for example), the runtime will finally know where the symbol is mapped in memory, and overwrite the <undefined-address> with the real address (for this run).
As suggested in Matthieu M.'s comment, it is the linker job to find the right "function" at the right place. Compilation steps are, roughly:
The compiler is invoked for each cpp file and translate it to an
object file (binary code) with a symbol table which associates
function name (names are mangled in c++) to their location in the
object file.
The linker is invoked only one time: whith every object file in
parameter. It will resolve function call location from one object
file to another thanks to symbol tables. One main() function MUST
exist somewhere. Eventually a binary executable file is produced
when the linker found everything it needs.
The preprocessor includes the content of the header files in to the cpp files (cpp files are called translation unit).
When you compile the code, each translational unit separately is checked for semantic and syntactic errors. The presence of function definitions across translation units is not considered. .obj files are generated after compilation.
In the next step when the obj files are linked. the definition of functions (member functions for classes) that are used gets searched and linking happens. If the function is not found a linker error is thrown.
In your example, If the function was not defined in myfunction.cpp, compilation would still go on with no problem. An error would be reported in the linking step.
int myfunction(int); is the function prototype. You declare function with it so that compiler knows that you are calling this function when you write myfunction(0);.
And how do the header and main function even know the function definition exists?
Well, this is the job of Linker.
When you compile a program, the preprocessor adds source code of each header file to the file that included it. The compiler compiles EVERY .cpp file. The result is a number of .obj files.
After that comes the linker. Linker takes all .obj files, starting from you main file, Whenever it finds a reference that has no definition (e.g. a variable, function or class) it tries to locate the respective definition in other .obj files created at compile stage or supplied to linker at the beginning of linking stage.
Now to answer your question: each .cpp file is compile into a .obj file containing instructions in machine code. When you include a .hpp file and use some function that's defined in another .cpp file, at linking stage the linker looks for that function definition in the respective .obj file. That's how it finds it.

C++ including a ".h" file, function duplication confusion

I'm currently writing a program, and couldn't figure out why I got an error (note: I already fixed it, I'm curious about WHY the error was there and what this implies about including .h files).
Basically, my program was structured as follows:
The current file I'm working with, I'll call Current.cc (which is an implementation of Current.h).
Current.cc included a header file, named CalledByCurrent.h (which has an associated implementation called CalledByCurrent.cc). CalledByCurrent.h contains a class definition.
There was a non-class function defined in CalledByCurrent.cc called thisFunction(). thisFunction() was not declared in CalledByCurrent.h since it was not actually a member function of the class (just a little helper function). In Current.cc, I needed to use this function, so I just redefined thisFunction() at the top of Current.cc. However, when I did this, I got an error saying that the function was duplicated. Why is this, when myFunction() wasn't even declared in CalledByCurrent.h?
Thus, I just removed the function from Current.cc, now assuming that Current.cc had access to thisFunction() from CalledByCurrent.cc. However, when I did this, I found that Current.cc did not know what function I was talking about. What the heck? I then copied the function definition for thisFunction() to the top of my CalledByCurrent.h file and this resolved the problem. Could you help me understand this behavior? Particularly, why would it think there was a duplicate, yet it didn't know how to use the original?
p.s - I apologize for how confusing this post is. Please let me know if there's anything I can clear up.
You are getting multiple definitions from the linker - it sees two functions with the same name and complains. For example:
// a.cpp
void f() {}
// b.cpp
void f() {}
then
g++ a.cpp b.cpp
gives:
C:\Users\neilb\Temp\ccZU9pkv.o:b.cpp:(.text+0x0): multiple definition of `f()'
The way round this is to either put the definition in only one .cpp file, or to declare one or both of the functions as static:
// b.cpp
static void f() {}
You can't have two global functions with the same name (even in 2 different translation units). To avoid getting the linker error define the function as static so that it is not visible outside the translation unit.
EDIT
You can use the function in the other .cpp file by using extern keyword. See this example:
//Test.cpp
void myfunc()
{
}
//Main.cpp
extern void myfunc();
int main()
{
myfunc();
}
It will call myfunc() defined in test.cpp.
The header file inclusion mechanism should be tolerant to duplicate header file inclusions.
That's because whenever you simply declare a function it's considered in extern (global) scope (whether you declare it in a header file or not). Linker will have multiple implementation for the same function signature.
If those functions are truely helper functions then, declare them as;
static void thisFunction();
Other way, if you are using the same function as helper then, simply declare it in a common header file, say:
//CalledByCurrent.h (is included in both .cc files)
void thisFunction();
And implement thisFunction() in either of the .cc files. This should solve the problem properly.
Here are some ideas:
You didn't put a header include guard in your header file. If it's being included twice, you might get this sort of error.
The function's prototype (at the top) doesn't match its signature 100%.
You put the body of the function in the header file.
You have two functions of the same signature in two different source files, but they aren't marked static.
If you are using gcc (you didn't say what compiler you're using), you can use the -E switch to view the preprocessor output. This includes expanding all #defines and including all #includes.
Each time something is expanded, it tells you what file and line it was in. Using this you can see where thisFunction() is defined.
There are 2 distinct errors coming from 2 different phases of the build.
In the first case where you have a duplicate, the COMPILER is happy, but the LINKER is complaining because when it picks up all the function definitions across the different source files it notices 2 are named the same. As the other answers state, you can use the static keyword or use a common definition.
In the second case where you see your function not declared in this scope, its because the COMPILER is complaining because each file needs to know about what functions it can use.
Compiling happens before Linking, so the COMPILER cannot know ahead of time whether or not the LINKER will find a matching function, thats why you use declarations to notify the COMPILER that a definition will be found by the LINKER later on.
As you can see, your 2 errors are not contradictory, they are the result of 2 separate processes in the build that have a particular order.

Strange linking behavior, latest g++

I have been running into strange linking behavior with g++, however, I'm just a student, and I was wondering if this was normal.
I am trying to link assembly code (machine: fedora 14 gnome 32bits x86 i686 intel i7) with c++ code and to have the assembly code call a method from a fonction instanciated in the c++ file. It seems that implementing a method in the class declaration will prevent it from being put in the linker table unless it's used at least once in the original source.
class A
{
public:
void showSUP() {
cout<<"sup";
}
};
After instanciating A, you will not be able to call _ZN1A7showSUPEv because it has not been put in the linking table:
call _ZN1A7showSUPEv
However, if you call A::showSUP() in the same .cpp as A was declared, then calling it from a seperate assembly file will work.
With (and instantiation of A)
class A
{
void showSUP();
};
A::showSUP()
{
cout<<"sup";
}
Calling _ZN1A7showSUPEv will work.
My question is, why doesn't the first example work.
Thank you all in advance.
There are attributes, that you can specifify for a function in this way
classe A
{
public:
void showSUP(){
cout<<"sup";
} __attribute__((used))
};
see gcc attribute overview
used
Found in versions: 3.1-3.4
Description:
This attribute, attached to a function, means that code must be
emitted for the function even if it appears that the function is
not referenced. This is useful, for example, when the function
is
referenced only in inline assembly.
For inlined functions the compiler
will only output code where the function is used.
Functions defined
inside the class definition are
inline (usually).
The function isn't
used.
Therefore: no function in the
binary.
In general, if you want a function to be included in the final library / executable, it need be:
used
non-inlined
And inlined function is a function whose code is simply copied and pasted where the function is used (by the compiler) so that there is no function call. It's an opportunity optimization, so a function may be inlined in some places and not inlined in others, depending on the context. Most very short functions (so-called one-liners) are generally inlined.
In the old times, to be inlined a function needed be defined in the current translation unit, that is:
either it is defined in the header (like is your case), and thus can be inlined in all sources including this header
either it is defined in a source file, and thus can be inlined in this source file
Nowadays though we also have LTO (Link Time Optimization), and if activated the linker may actually inline function calls. Those optimizations are also responsible for cleaning up the resulting library/binary from unused symbols.
There are two possible solutions to your issue:
define the function in a source file instead, it is standard and may not be wiped out
use compiler specific attributes to mark the function as used, so that it's not wiped out
In the latter case, if you wish for portability, I can only advise using a macro (ATTRIBUTE_USED) and define its content depending on the current compiler used.