How can I call LLVM IR arg with only one arg?

How can I call LLVM IR arg with only one arg? - llvm

In LLVM IR, if I define printf as a single arg func, I'm able to use it. However, if I define it as vararg, it gives an error:
#msg = constant [13 x i8] c"hello world\0A\00"
declare i32 #printf(i8*) ; works
;declare i32 #printf(i8*, ...) ; error: '#printf' defined with type 'i32 (i8*, ...)*'
; call i32 #printf(i8* %msg)
define i32 #main () {
%msg = getelementptr [13 x i8]* #msg, i64 0, i64 0
call i32 #printf(i8* %msg)
ret i32 0
}
How do I tell LLVM IR that printf is vararg, but call it with only one argument?

Note this passage from the description of the call instruction in the LLVM Language Reference (emphasis mine):
'fnty': shall be the signature of the function being called. The argument types must match the types implied by this signature. This type can be omitted if the function is not varargs.
So if the function is variadic, you do need to provide the function type as part of the call instruction.

Related

Illegal BitCast while compiling opencl kernel

The below bitcast instruction is throwing me an Illegal Bitcast error, can someone point what the problem is?
%opencl.image1d_ro_t = type opaque
%struct.dev_image_t = type { i8*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
%astype = bitcast %opencl.image1d_ro_t addrspace(1)* %image to %struct.dev_image_t*

You're casting from address space 1 to the default address space, 0. That won't work, as the documentation says. Each address space is independent.
Address spaces are meant for things like programs that have some garbage-collected and some manually managed memory. The pointer points to two profoundly different kinds of memory.

Will OCaml convert multi-argument function to currying or the other way around?

When I was learning OCaml essentials, I was told that every function in OCaml is actually a function with only one parameter. A multi-argument function is actually a function that takes one argument and returns a function that takes the next argumetn and returns ....
This is currying, I got that.
So my question is:
case 1
if I do
let plus x y = x + y
Inside OCaml when it compiles, will OCaml change it to let plus = fun x -> fun y -> x + y?
or the other way around that
case 2
If I do
let plus = fun x -> fun y -> x + y
OCaml will convert it to let plus x y = x + y?
Which case is true? What's the benifit or optimisation OCaml compiler has done in the correct case?
In addition, if case 2 is true, then what is the point to consider OCaml is doing currying? I mean it actually does the opposite way, right?
This question is actually related to Understand Core's `Fn.const`

Both let plus x y = x + y and let plus = fun x -> fun y -> x + y will be compiled to the same code:
camlPlus__plus:
leaq -1(%rax, %rbx), %rax
ret
Yes, exactly two assembler instructions, without any prologues and epilogues.
OCaml compiler performs several steps of optimizations, and actually "thinks" in a different categories. For example, both functions are represented with the same lambda code:
(function x y (+ x y))
I think, that according to the lambda above, you may think that OCaml compiler transforms to a non-curried version.
Update
I would also like to add a few words about the core's const function. Suppose we have two semantically equivalent representations of the const function:
let const_xxx c = (); fun _ -> c
let const_yyy c _ = c
in a lambda form they will be represented as:
(function c (seq 0a (function param c))) ; const_xxx
(function c param c) ; const_yyy
So, as you can see, const_xxx is indeed compiled in a curried form.
But the most interesting question, is why it is worth to write it in a such obscure code. Maybe there're some clues in assembly output (amd64):
camlPlus__const_xxx_1008:
subq $8, %rsp
.L101:
movq %rax, %rbx ; save c into %rbx (it was in %rax)
.L102:
subq $32, %r15 ; allocate memory for a closure
movq caml_young_limit(%rip), %rax ; check
cmpq (%rax), %r15 ; that we have memory, if not
jb .L103 ; then free heap and go back
leaq 8(%r15), %rax ; load closure address to %rax
movq $3319, -8(%rax)
movq camlPlus__fun_1027(%rip), %rdi
movq %rdi, (%rax)
movq $3, 8(%rax)
movq %rbx, 16(%rax) ; store parameter c in the closure
addq $8, %rsp
ret ; return the closure
.L103: call caml_call_gc#PLT
.L104: jmp .L102
What about const_yyy? It is compiled simply as:
camlPlus__const_yyy_1010:
ret
Just return the argument. So, it is assumed that the actual point of optimization, is that in const_xxx the closure creation is compiled inside the function and should be fast. On the other hand, const_yyy doesn't expect to be called in a curried way, so if you will call it without all the needed parameters, then compiler needs to add the code that creates a closure in the point of const_yyy partial application (i.e., to perform all the operations in the const_xxx every time you call const_xxx x).
To conclude, const optimization creates a function that is optimized for partial application. Although, it comes with cost. A non-optimized const function will outperform the optimized if they are called with all parameters. (Actually my parameter even droped a call to const_yyy when I applied it with two args.

As far as the semantics of the OCaml language is concerned both of those definitions are completely equivalent definitions of a curried function. There's no such thing as a multi-argument function in the semantics of the OCaml language.
However the implementation is a different matter. Specifically the current implementation of the OCaml language supports multi-argument functions in its internal representation. When a curried function is defined a certain way (i.e. as let f x y = ... or let f = fn x -> fn y -> ...), this will be compiled to a multi-argument function internally. However if it is defined differently (like let f x = (); fn y -> ... in the linked question), it will be compiled to a curried function. This is only an optimization and does not affect the semantics of the language in any way. All three ways of defining a curried function are semantically equivalent.
Regarding your specific question about what gets turned into what: Since the transformation isn't from one piece of OCaml code into another piece of OCaml code, but rather from OCaml code to an internal representation, I think the most accurate way to describe it would be to say that the OCaml compiler turns both let plus x y = x + y and let plus = fn x -> fn y -> x + y into the same thing internally, not that it turns one into the other.

Both case 1 and case 2 are curried functions. Here is the non-curried version:
let plus (x, y) = x + y

Okay, I learned that the native compiler will optimize your code, what I expect it to do. But here is the bytecode compiler:
let plus1 x y = x + y
let plus2 = fun x y -> x + y
let plus3 = function x -> function y -> x + y
treated with ocamlc -c -dinstr temp.ml gives me:
branch L4
restart
L1: grab 1
acc 1
push
acc 1
addint
return 2
restart
L2: grab 1
acc 1
push
acc 1
addint
return 2
restart
L3: grab 1
acc 1
push
acc 1
addint
return 2
which means the result is exactly the same, it is only a syntax difference. And the arguments are taken one by one.
Btw, one more syntax point: fun can be written with n arguments, function only with one.
From the conceptual point of view I would largely favor function x -> function y -> over the others.

How are c++ class instance members handed at the machine level?

I understand the basic layout (given a typical c++ implementation) of class instance members, but say you have MyClass with int num as a member, and you create an instance of it, how is the specific address of the member in memory handled at run time?
I'll be clearer with an example:
class MyClass
{
int num;
int num2;
int num3;
public:
void setNum(); //always sets num to 10
};
Then you call setnum, how does it know what memory to set to 10?
The memory layout for MyClass might look like
class MyClass size(12):
+---
0 | num
4 | num1
8 | num2
+---
So is it as simple as when setNum gets called with the hidden pointer to your instance of myclass for member access it gets written based on offest? forexample myclasspointer+4?
EDIT clarification how does it decide where to write to? failed copypaste left the vftable in there. I totally imagine its gonna just be a known offset right?
Or is it something more complex?
Apologizing for unclear terminallogy I rarely know how to phrase a question right...

The compiler will know the contents of the class (or struct), and most importantly the offsets of the different member variables. the setNum function is given a this pointer as a "hidden" argument, and the compiler will take the this variable and add the offset for num.
Exactly how this happens depends on the compiler. In LLVM, it would use a getelementptr VM instruction, which understands the structure and, given a base-address, adds the offset given by the index. This will then translate to some sort of instruction that takes the this and either a direct offset addition in a single instruction, or two instructions to load the pointer and then add the offset - depending a little bit on the architecture, and what the next instructions "need".
Since the num member is the first member in the struct, it will be zero, so on x86-64, compiled with clang++ -O1, we get this disassembly:
_ZN7MyClass6setNumEv: # #_ZN7MyClass6setNumEv
movl $10, (%rdi)
retq
In other words, move the number 10 into the address of this (in %rdi - first argument on a Linux machine).
The LLVM IR shows better what goes on:
%class.MyClass = type { i32, i32, i32 }
; Function Attrs: nounwind uwtable
define void #_ZN7MyClass6setNumEv(%class.MyClass* nocapture %this) #0 align 2 {
entry:
%num = getelementptr inbounds %class.MyClass* %this, i64 0, i32 0
store i32 10, i32* %num, align 4, !tbaa !1
ret void
}
The class contains 3 i32 (32 bit integers), and the function takes a this pointer, it then uses getelementptr to get the first element (element 0). Yes, there's one more argument than you'd expect. That's how LLVM works ;)
Then a store instruction for the value 10 into the %num calculated address.
If we change the code in setNum so that it stores 10 into num2 instead, we get:
define void #_ZN7MyClass6setNumEv(%class.MyClass* nocapture %this) #0 align 2 {
entry:
%num2 = getelementptr inbounds %class.MyClass* %this, i64 0, i32 2
store i32 10, i32* %num2, align 4, !tbaa !1
ret void
}
Note the change of the last number into getelementptr.
As assembly code it becomes:
_ZN7MyClass6setNumEv: # #_ZN7MyClass6setNumEv
movl $10, 8(%rdi)
retq
(As it currently stands, in Revision 2 of the original question, your class MyClass has a size of 12, 3 * 4 bytes, not 8 like your text says).

x86 logical address syntax error

Compiler: gcc 4.7.1, 32bit, ubuntu
Here's an example:
int main(void)
{
unsigned int mem = 0;
__asm volatile
(
"mov ebx, esp\n\t"
"mov %0, [ds : ebx]\n\t"
: "=m"(mem)
);
printf("mem = 0x%08x\n", mem);
return 0;
}
gcc -masm=intel -o app main.c
Assembler messages: invalid use of register!
As I know, ds and ss point to the same segment. I don't know why I can't use [ds : ebx] logical address for addressing.

Your code has two problems:
One: the indirect memory reference should be:
mov %0, ds : [ebx]
That is, with the ds out of the brackets.
Two: A single instruction cannot have both origin and destination in memory, you have to use a register. The easiest way would be to indicate =g that basically means whatever, but in your case it is not possible because esp cannot be moved directly to memory. You have to use =r.
Three: (?) You are clobbering the ebx register, so you should declare it as such, or else do not use it that way. That will not prevent compilation, but will make your code to behave erratically.
In short:
unsigned int mem = 0;
__asm volatile
(
"mov ebx, esp\n\t"
"mov %0, ds : [ebx]\n\t"
: "=r"(mem) :: "ebx"
);
Or better not to force to use ebx, let instead the compiler decide:
unsigned int mem = 0, temp;
__asm volatile
(
"mov %1, esp\n\t"
"mov %0, ds : [%1]\n\t"
: "=r"(mem) : "r"(temp)
);
BTW, you don't need the volatile keyword in this code. That is used to avoid the assembler to be optimized away even if the output is not needed. If you write the code for the side-effect, add volatile, but if you write the code to get an output, do not add volatile. That way, if the optimizing compiler determines that the output is not needed, it will remove the whole block.

Can the conditional operator lead to less efficient code?

Can ?: lead to less efficient code compared to if/else when returning an object?
Foo if_else()
{
if (bla)
return Foo();
else
return something_convertible_to_Foo;
}
If bla is false, the returned Foo is directly constructed from something_convertible_to_Foo.
Foo question_mark_colon()
{
return (bla) ? Foo() : something_convertible_to_Foo;
}
Here, the type of the expression after the return is Foo, so I guess first some temporary Foo is created if bla is false to yield the result of the expression, and then that temporary has to be copy-constructed to return the result of the function. Is that analysis sound?

A temporary Foo has to be constructed either way, and both cases are a clear candidate for RVO, so I don't see any reason to believe the compiler would fail to produce identical output in this case. As always, actually compiling the code and looking at the output is the best course of action.

It most definitely can where rvalue references are enabled. When one of the two branches is an lvalue and the other an rvalue, whichever way you go, you're going to not get the correct function called for at least one of them. When you do the if statement way, then the code will call the correct move or copy constructor for the return.

While I appreciate assembly output, I still find them a bit "too" low-level :)
For the following code:
struct Foo { Foo(): i(0) {} Foo(int i): i(i) {} int i; };
struct Bar { Bar(double d): d(d) {} double d; operator Foo() const { return Foo(d); } };
Foo If(bool cond) {
if (cond) { return Foo(); }
return Bar(3);
}
Foo Ternary(bool cond) {
return cond ? Foo() : Bar(3);
}
Here is the LLVM IR generated by Clang
define i64 #If(bool)(i1 zeroext %cond) nounwind readnone {
entry:
%retval.0.0 = select i1 %cond, i64 0, i64 3 ; <i64> [#uses=1]
ret i64 %retval.0.0
}
define i64 #Ternary(bool)(i1 zeroext %cond) nounwind readnone {
entry:
%tmp.016.0 = select i1 %cond, i64 0, i64 3 ; <i64> [#uses=1]
ret i64 %tmp.016.0
}
By the way, the llvm try out demo now uses Clang :p
Since it is not the first time that the question comes up, in one form or another, I would like to remember that since semantically both forms are equivalent, there is no reason for a good compiler to treat them any differently as far as optimization and code generation are concerned. The ternary operator is just syntactic sugar.

As always in case of performance question: measure for the case at hand, there are too many things to take into account to do any prediction.
Here, I'd not be surprised that some compilers have problems with one form or the other while others get rapidly to the same internal representation and thus generate exactly the same code.

I will be surprised if there is any difference since the two are logically equivalent. But this will depend on the compiler.

It depends on compiler. As far as i know, on most of compilers, if-else it's translated to cleaner ASM code and it's faster.
Edit: Assuming the code below
int a = 10;
int b = 20;
int c = 30;
int d = 30;
int y = 30;
y = (a > b) ? c : d;
if (a > b)
{
y = c;
}
else
{
y = d;
}
will be translated in ASM like this
y = (a > b) ? c : d;
008C13B1 mov eax,dword ptr [a]
008C13B4 cmp eax,dword ptr [b]
008C13B7 jle wmain+54h (8C13C4h)
008C13B9 mov ecx,dword ptr [c]
008C13BC mov dword ptr [ebp-100h],ecx
008C13C2 jmp wmain+5Dh (8C13CDh)
008C13C4 mov edx,dword ptr [d]
008C13C7 mov dword ptr [ebp-100h],edx
008C13CD mov eax,dword ptr [ebp-100h]
008C13D3 mov dword ptr [y],eax
if (a > b)
008C13D6 mov eax,dword ptr [a]
008C13D9 cmp eax,dword ptr [b]
008C13DC jle wmain+76h (8C13E6h)
{
y = c;
008C13DE mov eax,dword ptr [c]
008C13E1 mov dword ptr [y],eax
}
else
008C13E4 jmp wmain+7Ch (8C13ECh)
{
y = d;
008C13E6 mov eax,dword ptr [d]
008C13E9 mov dword ptr [y],eax
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can I call LLVM IR arg with only one arg? - llvm

Related

Illegal BitCast while compiling opencl kernel

Will OCaml convert multi-argument function to currying or the other way around?

How are c++ class instance members handed at the machine level?

x86 logical address syntax error

Can the conditional operator lead to less efficient code?

Categories

Resources