How to make a new LLVM instruction? - llvm

I am trying to make the instruction
%a = i32 add %b, %c
into
%a = i32 mul %b, %c
I have been searching for hours but the answers I have discovered so far are answers that are related to creating new instructions defined in Instructions.h (ex AllocaInst) not mul, sdiv, div, instructions.
I have also read the https://llvm.org/docs/ProgrammersManual.html#creating-and-inserting-new-instructions
but I couldn't get the example "auto *newInst = new Instruction(...);"
to work because I didn't know what the parameters of the should be even after looking at
https://llvm.org/doxygen/classllvm_1_1Instruction.html#a2b30181f228bc6318130557d7ffca945
EDIT:
I think I have successfully created the instruction itself, but not I'm having trouble replacing it in the right way.
Instruction *ni = BinaryOperator::CreateMul(inst.getOperand(0), newvalue);
ReplaceInstWithInst(*i, ni);
where *i is from iterating over a basic block (ex for (auto &i : bb))

Solved!
Instruction *ni = BinaryOperator::CreateMul(inst.getOperand(0), newvalue);
bb.getInstList().insert(inst.getIterator(), ni);

Related

Insert an `__asm__` block to do a addition in very large numbers

I am doing a program, and at this point I need to make it efficient.
I am using a Haswell microarchitecture (64bits) and the 'g++'.
The objective is made use of an ADC instruction, until the loop ends.
//I removed every carry handlers from this preview, yo be more simple
size_t anum = ap[i], bnum = bp[i];
unsigned carry;
// The Carry flag is set here with an common addtion
anum += bnum;
cnum[0]= anum;
carry = check_Carry(anum, bnum);
for (int i=1; i<n; i++){
anum = ap[i];
bnum = bp[i];
//I want to remove this line and insert the __asm__ block
anum += (bnum + carry);
carry = check_Carry(anum, bnum);
//This block is not working
__asm__(
"movq -64(%rbp), %rcx;"
"adcq %rdx, %rcx;"
"movq %rsi, -88(%rbp);"
);
cnum[i] = anum;
}
Is the CF set only in the first addition? Or is it every time I do an ADC instruction?
I think that the problem is on the loss of the CF, every time the loop is done. If it is this the problem how I can solve it?
You use asm like this in the gcc family of compilers:
int src = 1;
int dst;
asm ("mov %1, %0\n\t"
"add $1, %0"
: "=r" (dst)
: "r" (src));
printf("%d\n", dst);
That is, you can refer to variables, rather than guessing where they might be in memory/registers.
[Edit] On the subject of carries: It's not completely clear what you are wanting, but: ADC takes the CF as an input and produces it as an output. However, MANY other instructions muck with the flags, (such as likely those used by the compiler to construct your for-loop), so you probably need to use some instructions to save/restore the CF (perhaps LAHF/SAHF).

How to atomically add and fetch a 128-bit number in C++?

I use Linux x86_64 and clang 3.3.
Is this even possible in theory?
std::atomic<__int128_t> doesn't work (undefined references to some functions).
__atomic_add_fetch also doesn't work ('error: cannot compile this atomic library call yet').
Both std::atomic and __atomic_add_fetch work with 64-bit numbers.
It's not possible to do this with a single instruction, but you can emulate it and still be lock-free. Except for the very earliest AMD64 CPUs, x64 supports the CMPXCHG16B instruction. With a little multi-precision math, you can do this pretty easily.
I'm afraid I don't know the instrinsic for CMPXCHG16B in GCC, but hopefully you get the idea of having a spin loop of CMPXCHG16B. Here's some untested code for VC++:
// atomically adds 128-bit src to dst, with src getting the old dst.
void fetch_add_128b(uint64_t *dst, uint64_t* src)
{
uint64_t srclo, srchi, olddst[2], exchlo, exchhi;
srchi = src[0];
srclo = src[1];
olddst[0] = dst[0];
olddst[1] = dst[1];
do
{
exchlo = srclo + olddst[1];
exchhi = srchi + olddst[0] + (exchlo < srclo); // add and carry
}
while(!_InterlockedCompareExchange128((long long*)dst,
exchhi, exchlo,
(long long*)olddst));
src[0] = olddst[0];
src[1] = olddst[1];
}
Edit: here's some untested code going off of what I could find for the GCC intrinsics:
// atomically adds 128-bit src to dst, returning the old dst.
__uint128_t fetch_add_128b(__uint128_t *dst, __uint128_t src)
{
__uint128_t dstval, olddst;
dstval = *dst;
do
{
olddst = dstval;
dstval = __sync_val_compare_and_swap(dst, dstval, dstval + src);
}
while(dstval != olddst);
return dstval;
}
That isn't possible. There is no x86-64 instruction that does a 128-bit add in one instruction, and to do something atomically, a basic starting point is that it is a single instruction (there are some instructions which aren't atomic even then, but that's another matter).
You will need to use some other lock around the 128-bit number.
Edit: It is possible that one could come up with something that uses something like this:
__volatile__ __asm__(
" mov %0, %%rax\n"
" mov %0+4, %%rdx\n"
" mov %1,%%rbx\n"
" mov %1+4,%%rcx\n"
"1:\n
" add %%rax, %%rbx\n"
" adc %%rdx, %%rcx\n"
" lock;cmpxcchg16b %0\n"
" jnz 1b\n"
: "=0"
: "0"(&arg1), "1"(&arg2));
That's just something I just hacked up, and I haven't compiled it, never mind validated that it will work. But the principle is that it repeats until it compares equal.
Edit2: Darn typing too slow, Cory Nelson just posted the same thing, but using intrisics.
Edit3: Update loop to not unnecessary read memory that doesn't need reading... CMPXCHG16B does that for us.
Yes; you need to tell your compiler that you're on hardware that supports it.
This answer is going to assume you're on x86-64; there's likely a similar spec for arm.
From the generic x86-64 microarchitecture levels, you'll want at least x86-64-v2 to let the compiler know that you have the cmpxchg16b instruction.
Here's a working godbolt, note the compiler flag -march=x86-64-v2:
https://godbolt.org/z/PvaojqGcx
For more reading on the x86-64-psABI, the spec is published here.

StackDump while removing instruction from LLVM IR

Can anyone please tell me how to remove an instruction from IR permanently. I get a StackDump when I try to delete a mul instruction with the the following code:
Instruction *mul = BinaryOperator::CreateMul(op1,op1,"mul",tmp); //Creating new mul instruction
I2 = cast<Instruction>(i); //i is an Instruction iterator.
I2->replaceAllUsesWith(mul);
I2->eraseFromParent(); //Getting StackDump for this instruction

Reading from memory mapped register

The processor architecture I am working with has a time tag counter that I wanna read out to make performance measurements. The time tag counter is memory mapped to the
address 0x90000008. I used the following routine to read the value from the time
tage counter, however the difference in the print out is always zero. Anyone an idea
what I am missing?
char* const ADDR = (char *) 0x90000008;
unsigned long cycle_count_val() {
unsigned long res;
asm volatile (
"set ADDR, %%l0 \n\t"
"ld [%%l0], %0 \n\t"
: "=r" (res)
:
: "%l0"
);
return res;
}
....
unsigned long start = cycle_count_val();
execute_benchmark();
unsigned long end = cycle_count_val();
printf("Benchmark performance(in clock cycles) = %ld \r\n", end-start);
Many thanks for your help,
Phil
I don't think you need to resort to assember - why can't you just read from:
uint32_t tbr = (*((uint32_t volatile *) 0x90000008))

How to detect what CPU is being used during runtime?

How can I detect which CPU is being used at runtime ? The c++ code needs to differentiate between AMD / Intel architectures ? Using gcc 4.2.
The cpuid instruction, used with EAX=0 will return a 12-character vendor string in EBX, EDX, ECX, in that order.
For Intel, this string is "GenuineIntel". For AMD, it's "AuthenticAMD". Other companies that have created x86 chips have their own strings.The Wikipedia page for cpuid has many (all?) of the strings listed, as well as an example ASM listing for retrieving the details.
You really only need to check if ECX matches the last four characters. You can't use the first four, because some Transmeta CPUs also start with "Genuine"
For Intel, this is 0x6c65746e
For AMD, this is 0x444d4163
If you convert each byte in those to a character, they'll appear to be backwards. This is just a result of the little endian design of x86. If you copied the register to memory and looked at it as a string, it would work just fine.
Example Code:
bool IsIntel() // returns true on an Intel processor, false on anything else
{
int id_str; // The first four characters of the vendor ID string
__asm__ ("cpuid":\ // run the cpuid instruction with...
"=c" (id_str) : // id_str set to the value of EBX after cpuid runs...
"a" (0) : // and EAX set to 0 to run the proper cpuid function.
"eax", "ebx", "edx"); // cpuid clobbers EAX, ECX, and EDX, in addition to EBX.
if(id_str==0x6c65746e) // letn. little endian clobbering of GenuineI[ntel]
return true;
else
return false;
}
EDIT: One other thing - this can easily be changed into an IsAMD function, IsVIA function, IsTransmeta function, etc. just by changing the magic number in the if.
If you're on Linux (or on Windows running under Cygwin), you can figure that out by reading the special file /proc/cpuinfo and looking for the line beginning with vendor_id. If the string is GenuineIntel, you're running on an Intel chip. If you get AuthenticAMD, you're running on an AMD chip.
void get_vendor_id(char *vendor_id) // must be at least 13 bytes
{
FILE *cpuinfo = fopen("/proc/cpuinfo", "r");
if(cpuinfo == NULL)
; // handle error
char line[256];
while(fgets(line, 256, cpuinfo))
{
if(strncmp(line, "vendor_id", 9) == 0)
{
char *colon = strchr(line, ':');
if(colon == NULL || colon[1] == 0)
; // handle error
strncpy(vendor_id, 12, colon + 2);
fclose(cpuinfo);
return;
}
}
// if we got here, handle error
fclose(cpuinfo);
}
If you know you're running on an x86 architecture, a less portable method would be to use the CPUID instruction:
void get_vendor_id(char *vendor_id) // must be at least 13 bytes
{
// GCC inline assembler
__asm__ __volatile__
("movl $0, %%eax\n\t"
"cpuid\n\t"
"movl %%ebx, %0\n\t"
"movl %%edx, %1\n\t"
"movl %%ecx, %2\n\t"
: "=m"(vendor_id), "=m"(vendor_id + 4), "=m"(vendor_id + 8) // outputs
: // no inputs
: "%eax", "%ebx", "%edx", "%ecx", "memory"); // clobbered registers
vendor_id[12] = 0;
}
int main(void)
{
char vendor_id[13];
get_vendor_id(vendor_id);
if(strcmp(vendor_id, "GenuineIntel") == 0)
; // it's Intel
else if(strcmp(vendor_id, "AuthenticAMD") == 0)
; // it's AMD
else
; // other
return 0;
}
On Windows, you can use the GetNativeSystemInfo function
On Linux, try sysinfo
You probably should not check at all. Instead, check whether the CPU supports the features you need, e.g. SSE3. The differences between two Intel chips might be greater than between AMD and Intel chips.
You have to define it in your Makefile arch=uname -p 2>&1 , then use #ifdef i386 some #endif for diferent architectures.
I have posted a small project:
http://sourceforge.net/projects/cpp-cpu-monitor/
which uses the libgtop library and exposes data through UDP. You can modify it to suit your needs. GPL open-source. Please ask if you have any questions regarding it.