LLVM allows call instructions and defines to specify a calling convention. Does the IR itself already need to adhere to the specified convention? For example when using ccc, I believe that the return value would need to fit in the 64-bit eax on my OS/architecture. Am I allowed to write LLVM IR code that returns a struct of 3 i32's? Does LLVM convert that to something that adheres to the C calling convention? Could I change the calling convention without changing any other code?
When I look at the output of compiling a C file with -emit-llvm, the IR generator already applied the calling convention, and would allocate at the call site, and convert the return value as a pointer parameter. Is that absolutely necessary at this stage? What does LLVM do with the information of which calling convention to use at the next stage, -emit-obj?
There are many things mixed here, unfortunately. The calling convention is usually defined in terms of the source language. And many necessary details are already lost when converting to LLVM IR. So, in order to preserve the ABI and calling convention frontend is supposed to properly lay out arguments / return values so they will be properly codegen'ed at LLVM level.
So, making long story short: the calling convention contains both high-level (source language) and low-level requirements. The former are handled by frontend and the latter – by the backend. You can change the LLVM IR, but you need to make sure that the code generated will be indeed compatible with your C code. And on some platforms this might be complicated.
Related
I am currently working on implementing a JIT compiler using LLVM.
The problem I have is that a portion of the compiler runtime is implemented in C.
From my intermediate representation, I can generate all native functions. However, certain operations in the language require calls to these external C-functions.
The problem I have is that I do not from my IR know the exact types of parameters passed to these functions, they might take an int, or they might take a float. The previous code generation was based on C, and the weak typing of C allowed to call these functions with no hassle. However, when generating LLVM-IR I need to know the signatures of these pre-compiled functions, is there any way to figure out the signature using the orc-API or some other method, or would it be better to just hard code the configuration for each necessary function?
The solution I implemented for this problem follows the following simple scheme.
During compilation calls to these functions are generated the signature simply being based on what parameters and return values are passed to the pre-compiled function in question.
This works well for my example since I use the C calling convention with LLVM. The C calling convention to quote the LLVM homepage tolerates mismatch:
"This calling convention (the default if no other calling convention is specified) matches the target C calling conventions. This calling convention supports varargs function calls and tolerates some mismatch in the declared prototype and implemented declaration of the function (as does normal C)." [1]: https://llvm.org/docs/LangRef.html#calling-conventions
I was trying to add a function declaration for something provided by the system.
However, the function prototype returns size_t, which is int32 on 32bit platform and int64 on 64bit platform.
I'd like to know if there is a method to detect target platform and add the declaration accordingly?
After a bit of research, LLVM IR as a target neutral language cannot possibly know target-specific type sizes. Have a look at this relevant discussion where Chris Lattner comments on the subject. Also, at this relevant SO question.
So, this is the job of the front-end and this causes extra bookkeeping information that front-ends need to "know" for a target and its ABI. So, for example, you might have needs for projects like this in the case of the Loci programming language.
Now, specifically for size_t according to this:
[...] std::size_t can safely store the value of any non-member pointer, in
which case it is synonymous with std::uintptr_t.
So, you could use the getIntPtrType method of DataLayout class.
For any other data types, I'm not sure how far "guessing" can get you (probably not very far judging from the previous references).
Lastly, another alternative could be extending LLVM with a custom intrinsic (see memcpy for example), which inevitably goes through specific definition per target.
For actually adapting your integer type creation you could use the sizeof operator along with the use of CHAR_BIT, in order to provide the correct number of bits in the getIntNType call.
This will get you as far as using the right size for integer type on the platform where your module pass is built on.
For detecting a type's size 'dynamically' on the platform where your pass is being run, I know of none other way than providing that info in some sort of configuration file.
However, this can be automated and using the example of various build systems (e.g. cmake which is also used by LLVM), you can craft a simple program that can be compiled and automate that generation.
To that end, and to make this as portable as possible and avoid reinventing the wheel, you can use cmake's CheckTypeSize module.
I have limited knowledge in assembly, but I can at least read through it and match with the corresponding C or C++ code. I can see that the function arguments are passed either by pushing them to the stack or by registers, and the function body uses some registers to do its operations. But it also seems to use the same registers that were used in the caller. Does this mean that the caller has no guarantee that the state of the registers will be the same after a function call? What if the whole body of the function is unknown during compilation? How does the compiler deal with this?
The compiler-generated assembler code follows some calling convention. A calling convention typically specifies
how are arguments passed to the function
how return values are passed from the called function to the caller
which registers should be saved within a function call and which can be modified
If all functions being called follow the same calling convention, no problems with using the same registers should occur.
As the comments allude to, the fact is that there is no standard for this. It is left entirely to the implementors of the particular c++ compiler you are using.
A more explicit question, like this: "when compiling on version N of compiler A with compiler options B, calling a function signature of C, for target CPU D, using ABI E, what are the guarantees vis-a-vis register preservation?"
In which case an expert (or the manual) on that particular toolset can answer.
As you can probably infer, for any kind of industrial-strength project, it's the wrong question to ask, because as your compiler evolves the answer will change, and you don't want that fact to impact the reliability of your program.
It's a good question, because it's nice to know what the compiler is doing under the hood - it aids learning.
But on the whole, the golden rule is to express clear uncomplicated logic to the compiler in your program, and allow the compiler to handle the details of turning that logic into optimised machine code, at which modern compilers are excellent.
I know that __stdcall functions can't have ellipses, but I want to be sure there are no platforms that support the stdarg.h functions for calling conventions other than __cdecl or __stdcall.
The calling convention has to be one where the caller clears the arguments from the stack (because the callee doesn't know what will be passed).
That doesn't necessarily correspond to what Microsoft calls "__cdecl" though. Just for example, on a SPARC, it'll normally pass the arguments in registers, because that's how the SPARC is designed to work -- its registers basically act as a call stack that gets spilled to main memory if the calls get deep enough that they won't fit into register anymore.
Though I'm less certain about it, I'd expect roughly the same on IA64 (Itanium) -- it also has a huge register set (a couple hundred if memory serves). If I'm not mistaken, it's a bit more permissive about how you use the registers, but I'd expect it to be used similarly at least a lot of the time.
Why does this matter to you? The point of using stdarg.h and its macros is to hide differences in calling convention from your code, so it can work with variable arguments portably.
Edit, based on comments: Okay, now I understand what you're doing (at least enough to improve the answer). Given that you already (apparently) have code to handle the variations in the default ABI, things are simpler. That only leaves the question of whether variadic functions always use the "default ABI", whatever that happens to be for the platform at hand. With "stdcall" and "default" as the only options, I think the answer to that is yes. Just for example, on Windows, wsprintf and wprintf break the rule of thumb, and uses cdecl calling convention instead of stdcall.
The most definitive way that you can determine this is to analyze the calling conventions. For variadic functions to work, your calling convention needs a couple of attributes:
The callee must be able to access the parameters that aren't part of the variable argument list from a fixed offset from the top of the stack. This requires that the compiler push the parameters onto the stack from right to left. (This includes such things as the first parameter to printf, the format specification. Also, the address of the variable argument list itself must also be derived from a known location.)
The caller must be responsible for removing the parameters off the stack once the function has returned, because only the compiler, while generating the code for the caller, knows how many parameters were pushed onto the stack in the first place. The variadic function itself does not have this information.
stdcall won't work because the callee is responsible for popping parameters off the stack. In the old 16-bit Windows days, pascal wouldn't work because it pushed parameters onto the stack from left to right.
Of course, as the other answers have alluded to, many platforms don't give you any choice in terms of calling convention, making this question irrelevant for those ones.
Consider the following function on an x86 system:
void __stdcall something(char *, ...);
The function declares itself as __stdcall, which is a callee-clean convention. But a variadic function cannot be callee-clean since the callee does not know how many parameters were passed, so it doesn’t know how many it should clean.
The Microsoft Visual Studio C/C++ compiler resolves this conflict by silently converting the calling convention to __cdecl, which is the only supported variadic calling convention for functions that do not take a hidden this parameter.
Why does this conversion take place silently rather than generating a warning or error?
My guess is that it’s to make the compiler options /Gr (set default calling convention to __fastcall) and /Gz (set default calling convention to __stdcall) less annoying.
Automatic conversion of variadic functions to __cdecl means that you can just add the /Gr or /Gz command line switch to your compiler options, and everything will still compile and run (just with the new calling convention).
Another way of looking at this is not by thinking of the compiler as converting variadic __stdcall to __cdecl but rather by simply saying “for variadic functions, __stdcall is caller-clean.”
click here
AFAIK, the diversity of calling conventions is unique to DOS/Windows on x86. Most other platforms had compilers come with the OS and standardize the convention.
Do you mean 'platforms supported by MSVC" or as a general rule? Even if you confine yourself to the platforms supported by MSVC, you still have situations like IA64 and AMD64 where there is only "one" calling convention, and that calling convention is called __stdcall, but it's certainly not the same __stdcall you get on x86.
Here's my previous question about switching C callstacks. However, C++ uses a different calling convention (thiscall) and may require some different asm code. Can someone explain the differences and point to or supply some code snippets that switch C++ callstacks (preferably in GCC inline asm)?
Thanks,
James
The code given in the previous question should work fine.
The thiscall calling convention differs only in who is responsible for popping the arguments off the stack. Under the thiscall calling convention, the callee pops the arguments (and additionally, the this pointer is passed in ecx); under the C calling convention, the caller pops the arguments. This does not affect context switches.
However, if you're going to do context switches yourself, note that you need to save and restore the registers as well (probably on the stack) in addition to switching stacks.
Note, by the way, that C++ doesn't always use thiscall -- it's only used for methods with a fixed number of arguments (and apart from that, it's a Microsoftism... g++ doesn't use it).
Note the ABI for C++ is not explicitly defined.
The idea was that compiler manufactures are able to use the optimal calling convention for the situation and thus make C++ faster.
The down side of this is that each compiler has its own calling convention thus code from different compilers are not compatable (even code form different versions (or even different optimization flags) of the same compiler can be incompatable).