llvm : lower 3 operands to 2 operands instruction - llvm

Currently llvm's add, sub,... instructions require 3 operands : dest, src1, src2.
How can I write a custom "add" instruction that only supports 2 operands ?
Eg : dest = dest + src1.
I tried this in the .td file, but it didn't work :
defm Reg: Instr<opcode, (outs RC:$dest), (ins RC:$A),
!strconcat(opcodeStr, " $dest, $dest, $A"),
[(set Ty:$dest, (opNode Ty:$dest, Ty:$A))]>;
It complains that "Input operand $dest occurs in pattern but not in operands list!"
Thanks.

You need to model it as 3-operand instruction, but add a constraint that one of source operands equals to destination. See X86 backend as an typical example how to do this.

Related

What is the role of & in this piece of code?

I am trying to understand dbns code in foam-extend. But I am having bit of doubt in a specific part of the following code given below.
deltaRLeft & gradrho[own] or
deltaRRight & gradU[nei]
I thing & used here is a reference operator, but if any one can explain it in more detail, it will helpful for me.
Flux::evaluateFlux
(
rhoFlux_[faceI],
rhoUFlux_[faceI],
rhoEFlux_[faceI],
rho_[own] + rhoLimiter[own]*(deltaRLeft & gradrho[own]),
rho_[nei] + rhoLimiter[nei]*(deltaRRight & gradrho[nei]),
U_[own] + cmptMultiply(ULimiter[own], (deltaRLeft & gradU[own])),
U_[nei] + cmptMultiply(ULimiter[nei], (deltaRRight & gradU[nei])),
T_[own] + TLimiter[own]*(deltaRLeft & gradT[own]),
T_[nei] + TLimiter[nei]*(deltaRRight & gradT[nei]),
R[own],
R[nei],
Cv[own],
Cv[nei],
Cp[own],
Cp[nei],
Sf[faceI],
magSf[faceI]
);
What is the & exactly doing here, if it can be explained in detail.
The part of the code is from dbns/numericFlux/numericFlux.C
It's the bitwise and operator.
It compares each bit of the first operand to the corresponding bit of the second operand.
If both bits are 1, the result bit is set to 1 otherwise 0.
As example:
11001001
& 10111000
--------
= 10001000
I am afraid that the first and second answers are mostly not applicable within the OpenFOAM context.
In the OpenFOAM context, & is the inner product if the operands are tensors. Thereat, deltaRRight and gradT[nei] are in fact tensor objects.
Please look at the OpenFOAM's programmers' guide, Section 1.4.1 and 1.3.1.
There are two different & operators.
The bitwise AND operator (&) compares each bit of the first operand to the corresponding bit of the second operand. If both bits are 1, the corresponding result bit is set to 1. Otherwise, the corresponding result bit is set to 0.
Both operands to the bitwise AND operator must be of integral types.
For example:
#include <iostream>
using namespace std;
int main() {
unsigned short a = 0xFFFF; // pattern 1111 ...
unsigned short b = 0xAAAA; // pattern 1010 ...
cout << hex << ( a & b ) << endl; // prints "aaaa", pattern 1010 ...
}

32-bit bitwise Exclusive-OR negation value of a binary packet

I am required to generate a 4 byte checksum which is defined as "a 32-bit bitwise exclusive-OR negation value" of some piece of binary data. I am re-writing the encode/decode sections of a certain MML interface for a billing system in Erlang.
The C/C++ version of such a function is here below:
Function: GetChkSum
Description:
A 32-bit bitwise Exclusive-OR negation value of "message header
+ session header + transaction header + operation information".
Input:
len indicates the total length of "message header + session header
+ transaction header + operation information".
Buf indicates the string consisting of message header, session header,
transaction header, and operation information.
Output: res indicates the result of the 32-bit bitwise Exclusive-OR negation
value
void GetChkSum(Int len, PSTR buf, PSTR res)
{
memset(res, 0, MSG_CHKSUM_LEN);
for(int i=0; i<len; i+=4)
{
res[0]^=(buf+i)[0];
res[1]^=(buf+i)[1];
res[2]^=(buf+i)[2];
res[3]^=(buf+i)[3];
};
res[0]=~res[0];
res[1]=~res[1];
res[2]=~res[2];
res[3]=~res[3];
};
I am required to re-write this in Erlang. How can I do this?
There is no difficulty to do an xor in erlang (the operator to use is bxor and works with integer). But to write any code you need to define the "format" of input and output first. From your example I guess it may be ascii code, stored in a binary, or a string??
Once you have define the input type, the result can be evaluated with a function of the type:
negxor(<<>>,R) -> int_to_your_result_type(bnot(R) band 16#FFFFFFFF);
negxor(<<H:32,Q:binary>>,R) -> negxor(Q,R bxor H).
and you can call it with negxor(your_input_to_binary(Input),0).

What's the fastest way to shift several unsigned chars at once so they flow from one to the other?

We have the following array...
unsigned char pixelData[16][5]
...which represents a 40x16 1bpp display used to feed a LED matrix.
Shifting our drawing up or down a single pixel at a time is easy as we just change the first indexer. However, since the second indexer represents an entire byte, not a single pixel, we're not sure the most efficient way to shift the data left or right.
My first thought would be to do it pixel-by-pixel, but that of course would be pretty slow requiring eight operations per byte.
My next thought was to shift each byte left or right, but storing the shifted-out bit for replacement in the one next to it. That drops it to three ops per byte (store, shift, replace) but it also makes the logic complex if you wanted to shift more than a single pixel at a time, as you'd also need a mask.
My thought after that was to cast specific members of the array to uint16_ts so I could work on them with a 16-bit length instead of 8, which would reduce the steps from 6 (store/shift/replace x 2) to four (cast/store/shift/replace)... but I still have to think there's a faster way to do this. Again though, if you're shifting more than one place, you're right back to the masking issue.
That said, what's the most efficient way to shift an entire row of bytes?
If it matters, this is on an Atmel chip, specifically on Arduinos.
If you're open to including some assembler in your code, Atmel AVR uC have the perfect instruction for this purpose : ROR or ROL. They make a bit shift with carry retaining and last carry insertion.
ROR(x) : x[b7] = Carry; x = x >> 1; Carry = x[b0]
You would just have to execute this instruction 5 times to get exactly what you're looking for.
EDIT: something like that in Arduino IDE should do the trick (I haven't tested it):
asm ("rol %0" : "=r" (pixelData[16][0]) : "0" (pixelData[16][0]));
asm ("rol %0" : "=r" (pixelData[16][1]) : "0" (pixelData[16][1]));
asm ("rol %0" : "=r" (pixelData[16][2]) : "0" (pixelData[16][2]));
asm ("rol %0" : "=r" (pixelData[16][3]) : "0" (pixelData[16][3]));
asm ("rol %0" : "=r" (pixelData[16][4]) : "0" (pixelData[16][4]));
The bitflow would be:
ROL <value>:
# Carry Register = 8th bit of the given value
# value = value << 1
# 1st bit of the value = Previous Carry Register)
ROL 0b10000001 # Result = 0b00000010 and Carry = 1
ROL 0b10000001 # Result = 0b00000011 and Carry = 1
ROL 0b00000001 # Result = 0b00000011 and Carry = 0
ROL 0b00000001 # Result = 0b00000010 and Carry = 0

What is this doing: "input >> 4 & 0x0F"?

I don't understand what this code is doing at all, could someone please explain it?
long input; //just here to show the type, assume it has a value stored
unsigned int output( input >> 4 & 0x0F );
Thanks
bitshifts the input 4 bits to the right, then masks by the lower 4 bits.
Take this example 16 bit number: (the dots are just for visual separation)
1001.1111.1101.1001 >> 4 = 0000.1001.1111.1101
0000.1001.1111.1101 & 0x0F = 1101 (or 0000.0000.0000.1101 to be more explicit)
& is the bitwise AND operator. "& 0x0F" is sometimes done to pad the first 4 bits with 0s, or ignore the first(leftmost) 4 bits in a value.
0x0f = 00001111. So a bitwise & operation of 0x0f with any other bit pattern will retain only the rightmost 4 bits, clearing the left 4 bits.
If the input has a value of 01010001, after doing &0x0F, we'll get 00000001 - which is a pattern we get after clearing the left 4 bits.
Just as another example, this is a code I've used in a project:
Byte verflag = (Byte)(bIsAck & 0x0f) | ((version << 4) & 0xf0). Here I'm combining two values into a single Byte value to save space because it's being used in a packet header structure. bIsAck is a BOOL and version is a Byte whose value is very small. So both these values can be contained in a single Byte variable.
The first nibble in the resultant variable will contain the value of version and the second nibble will contain the value of bIsAck. I can retrieve the values into separate variables at the receiving by doing a 4 bits >> while taking the value of version.
Hope this is somewhere near to what you asked for.
That is doing a bitwise right shift the contents of "input" by 4 bits, then doing a bitwise AND of the result with 0x0F (1101).
What it does depends on the contents and type of "input". Is it an int? A long? A string (which would mean the shift and bitwise AND are being done on a pointer to the first byte).
Google for "c++ bitwise operations" for more details on what's going on under the hood.
Additionally, look at C++ operator precedence because the C/C++ precedence is not exactly the same as in many other languages.

Weird behavior of right shift operator (1 >> 32)

I recently faced a strange behavior using the right-shift operator.
The following program:
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <stdint.h>
int foo(int a, int b)
{
return a >> b;
}
int bar(uint64_t a, int b)
{
return a >> b;
}
int main(int argc, char** argv)
{
std::cout << "foo(1, 32): " << foo(1, 32) << std::endl;
std::cout << "bar(1, 32): " << bar(1, 32) << std::endl;
std::cout << "1 >> 32: " << (1 >> 32) << std::endl; //warning here
std::cout << "(int)1 >> (int)32: " << ((int)1 >> (int)32) << std::endl; //warning here
return EXIT_SUCCESS;
}
Outputs:
foo(1, 32): 1 // Should be 0 (but I guess I'm missing something)
bar(1, 32): 0
1 >> 32: 0
(int)1 >> (int)32: 0
What happens with the foo() function ? I understand that the only difference between what it does and the last 2 lines, is that the last two lines are evaluated at compile time. And why does it "work" if I use a 64 bits integer ?
Any lights regarding this will be greatly appreciated !
Surely related, here is what g++ gives:
> g++ -o test test.cpp
test.cpp: In function 'int main(int, char**)':
test.cpp:20:36: warning: right shift count >= width of type
test.cpp:21:56: warning: right shift count >= width of type
It's likely the CPU is actually computing
a >> (b % 32)
in foo; meanwhile, the 1 >> 32 is a constant expression, so the compiler will fold the constant at compile-time, which somehow gives 0.
Since the standard (C++98 §5.8/1) states that
The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
there is no contradiction having foo(1,32) and 1>>32 giving different results.
On the other hand, in bar you provided a 64-bit unsigned value, as 64 > 32 it is guaranteed the result must be 1 / 232 = 0. Nevertheless, if you write
bar(1, 64);
you may still get 1.
Edit: The logical right shift (SHR) behaves like a >> (b % 32/64) on x86/x86-64 (Intel #253667, Page 4-404):
The destination operand can be a register or a memory location. The count operand can be an immediate value or the CL register. The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if 64-bit mode and REX.W is used). A special opcode encoding is provided for a count of 1.
However, on ARM (armv6&7, at least), the logical right-shift (LSR) is implemented as (ARMISA Page A2-6)
(bits(N), bit) LSR_C(bits(N) x, integer shift)
assert shift > 0;
extended_x = ZeroExtend(x, shift+N);
result = extended_x<shift+N-1:shift>;
carry_out = extended_x<shift-1>;
return (result, carry_out);
where (ARMISA Page AppxB-13)
ZeroExtend(x,i) = Replicate('0', i-Len(x)) : x
This guarantees a right shift of ≥32 will produce zero. For example, when this code is run on the iPhone, foo(1,32) will give 0.
These shows shifting a 32-bit integer by ≥32 is non-portable.
OK. So it's in 5.8.1:
The operands shall be of integral or enumeration type and integral promotions are performed. The type of the result is
that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to
the length in bits of the promoted left operand.
So you have an Undefined Behaviour(tm).
What happens in foo is that the shift width is greater than or equal to the size of the data being shifted. In the C99 standard that results in undefined behaviour. It's probably the same in whatever C++ standard MS VC++ is built to.
The reason for this is to allow compiler designers to take advantage of any CPU hardware support for shifts. For example, the i386 architecture has an instruction to shift a 32 bit word by a number of bits, but the number of bits is defined in a field in the instruction that is 5 bits wide. Most likely, your compiler is generating the instruction by taking your bit shift amount and masking it with 0x1F to get the bit shift in the instruction. This means that shifting by 32 is the same as shifting by 0.
I compiled it on 32 bit windows using VC9 compiler. It gave me the following warning. Since sizeof(int) is 4 bytes on my system compiler is indicating that right shifting by 32 bits results in undefined behavior. Since it is undefined, you can not predict the result. Just for checking I right shifted with 31 bits and all the warnings disappeared and the result was also as expected (i.e. 0).
I suppose the reason is that int type holds 32-bits (for most systems), but one bit is used for sign as it is signed type. So only 31 bits are used for actual value.
The warning says it all!
But in fairness I got bitten by the same error once.
int a = 1;
cout << ( a >> 32);
is completely undefined. In fact the compiler generally gives a different results than the runtime in my experience. What I mean by this is if the compiler can see to evaluate the shift expression at run time it may give you a different result to the expression evaluated at runtime.
foo(1,32) performs a rotate-shit, so bits that should disappear on the right reappear on the left. If you do it 32 times, the single bit set to 1 is back to its original position.
bar(1,32) is the same, but the bit is in the 64-32+1=33th bit, which is above the representable numbers for a 32-bit int. Only the 32 lowest bit are taken, and they are all 0's.
1 >> 32 is performed by the compiler. No idea why gcc uses a non-rotating shift here and not in the generated code.
Same thing for ((int)1 >> (int)32)