Live range (interval) analysis on llvm - llvm

How does llvm compute the live ranges (intervals) of its temporaries?
Here is an example C file:
$ cat main.c
int main()
{
int i = 200;
int j = 300;
while (j)
{
i=i+1;
j=j-1;
}
return 0;
}
I execute the following commands, and then examine both main.ll and main.mem2reg.ll. I'm probably wrong, but it looks like computing live ranges is much easier on main.ll(?) is mem2reg an essential pass when computing the live range? or is it a nice-to-have for certain optimizations?
$ clang -c -emit-llvm -O0 main.c -o main.bc
$ opt -instnamer main.bc -o main.bc
$ opt -mem2reg main.bc -o main.mem2reg.bc
$ llvm-dis main.bc
$ llvm-dis main.mem2reg.bc

To perform live range analysis, you may have to know some knowledge about liveness analysis or you can checkout cranelift's annotation.
Pass mem2reg just transform the IR into SSA form, but for live range analysis, I think this is not necessary.
I have an simple implementation of live variable analysis with llvm, please checkout my GitHub repo: https://github.com/lijiansong/clang-llvm-tutorial/tree/master/live-variable-analysis

Related

Core latency testing ARMv8.1

There is an interesting article about ARM8.1 Graviton 2 offering of AWS.
This article has tests for CPU coherency where I am trying to repeat.
There is C++ code repo in GitHub named core-latency using Nonius Micro-benchmarking.
I managed to replicate the first test without atomic instructions using the command below to compile:
$ g++ -std=c++11 -Wall -pthread -O3 -Iinclude -o core-latency main.cpp -march=armv8-a
The article claims that ARMv8.1 uses atomic CAS operations and has much better performance. It also provides test results that are much better.
I tried to repeat it compiling with ARMv8.1, ARMv8.2, and ARMv8.3. Sample commands for compilation are below:
$ g++ -std=c++11 -Wall -pthread -O3 -Iinclude -o core-latency main.cpp -march=armv8.1-a+lse
$ g++ -std=c++11 -Wall -pthread -O3 -Iinclude -o core-latency main.cpp -march=armv8.2-a+lse
$ g++ -std=c++11 -Wall -pthread -O3 -Iinclude -o core-latency main.cpp -march=armv8.3-a+lse
None of these improved the performance. Because of that I got the assembly code for it using these commands:
g++ -std=c++11 -Wall -pthread -O3 -Iinclude -S main.cpp -march=armv8.1-a+lse
g++ -std=c++11 -Wall -pthread -O3 -Iinclude -S main.cpp -march=armv8.2-a+lse
g++ -std=c++11 -Wall -pthread -O3 -Iinclude -S main.cpp -march=armv8.3-a+lse
I searched the code and cannot find any CAS operations used.
I also tried the different variations of compilation with or without "lse" and "-moutline-atomics".
I am not a C++ expert and I have a very basic understanding of it.
My guess is that the code needs some changes to use atomic instructions.
Tests are executed on m6g.16xlarge EC2 instance in AWS. OS Ubuntu 20.04.
So if someone can check the core-latency code and give some insights to make sure that it compiles with CAS instructions, that will be a great help.
After doing some more experiments, I found the problem.
In the code snippet below are the steps:
making a comparison first (if state equals Ping)
calling the class method set to do an atomic store operation.
Code snippet from core-latency:
if (state == Ping)
sync.set(Pong);
...
void set(State new_state)
{
state.store(new_state);
}
All of the code never compiles to a CAS instruction. If you want to have an atomic compare and swap operation, you need to use the relevant method from atomic.
I have written below a sample code for experimenting:
#include <atomic>
#include <cstdio>
int main() {
int expected = 0;
int desired = 1;
std::atomic<int> current;
current.store(expected);
printf("Before %d\n", current.load());
while(!current.compare_exchange_weak(expected,desired));
printf("After %d\n", current.load());
}
I compiled it for ARMv8.1 and can see that it is using CAS instruction.
I compiled it for ARMv8.0 and can see that it is not using CAS instruction (which is OK as it is not supported in this version).
So if I want to get CAS instruction sets used, I need to use atomic::compare_exchange_weak or atomic::compare_exchange_strong; otherwise, the compiler will not use CAS but compile your comparison and store operations separately.
In summary, I can rewrite the benchmark with atomic::compare_exchange_weak and see what results I am getting.
New update April 30
I have created the new version of the code with atomic compare and swap support.
It is available here https://github.com/fuatu/core-latency-atomic
Here are the test results for instance m6g.16xlarge (ARM):
Without CAS: Average latency 245ns
With CAS: Average latency 39ns

Unable to link PAPI library with opt llvm

I am working on a project where I need to generate just the bitcode using clang, run some optimization passes using opt and then create an executable and measure its hardware counters.
I am able to link through clang directly using:
clang -g -O0 -w -I/opt/apps/papi/5.3.0/include -Wl,-rpath,$PAPI_LIB -L$PAPI_LIB \
-lpapi /scratch/02681/user/papi_helper.c prog.c -o a.out
However now I want to link it after using the front end of clang and applying optimization passes using opt.
I am trying the following way:
clang -g -O0 -w -c -emit-llvm -I/opt/apps/papi/5.3.0/include -Wl,-rpath,$PAPI_LIB -L$PAPI_LIB \
-lpapi /scratch/02681/user/papi_helper.c prog.c -o prog.o
llvm-link prog.o papi_helper.o -o prog-link.o
// run optimization passes
opt -licm prog-link.o -o prog-opt.o
llc -filetype=obj prog-opt.o -o prog-exec.o
clang prog-exec.o
After going through the above process I get the following error:
undefined reference to `PAPI_event_code_to_name'
It's not able to resolve papi functions. Thanks in advance for any help.
Clearly, you need to add -lpapi to the last clang invocation. How else the linker would know about libpapi?

When to use the -g flag to GCC

I'm trying to force ValGrind to tell me what's wrong with my program. Every shred of documentation on the face of the Internet says that you must supply the -g option to GCC, but not one single document says whether you need this flag at compile-time or link-time (or both). So which is it??
The GNU ld documentation says that -g will be ignored, so it doesn't make much sense to pass it. In general you pass -g to gcc (which really is a front-end for the whole compilation process and not just a compiler) and it will take care of it.
GCC provides -g flag to get the debugging, So one you compile the program like
Consider a code of example.c like:
#include <stdio.h>
/* Warning: This program is wrong on purpose. */
int main()
{
int age = 10;
int height;
printf("I am %d years old.\n");
printf("I am %d inches tall.\n", height);
return 0;
}
By default if you compile say using make example
It will trigger command
cc example.c -o example
Now you run command like
cc -g example.c -o example1
then you will find the size of the file example1 is greater than the size of example
because -g flag enabled the debugging information.
While running valgrind is -g flag is not required. -g is only required in compilation process.

How to replace llvm-ld with clang?

Summary: llvm-ld has been removed from the LLVM 3.2 release. I am trying to figure out how to use clang in its place in my build system.
Note that I figured out the answer to my own question while writing it but I am still posting it in case it is useful to anyone else. Alternative answers are also welcome.
Details:
I have a build process which first generates bitcode using clang++ -emit-llvm. Then I take the bitcode files and link them together with llvm-link. Then I apply some standard optimization passes with opt. Then I apply another custom compiler pass with opt. Then I apply the standard optimization passes again using opt a third time. Finally I take the output from the last run of opt and use llvm-link to link with appropriate libraries to generate my executable. When I tried to replace llvm-link with clang++ in this process I get the error message: file not recognized: File format not recognized
To make this question more concrete I created a simplified example of what I am trying to do. First there are two files that I want to compile and link together
test1.cpp:
#include <stdio.h>
int getNum();
int main()
{
int value = getNum();
printf("value is %d\n", value);
return 0;
}
test2.cpp
int getNum()
{
return 5;
}
I executed the following sequence of commands:
clang++ -emit-llvm -c test1.cpp test2.cpp
llvm-link -o test.bc1 test1.o test2.o
opt test.bc1 -o test.bc2 -std-compile-opts
(Note that I am currently running llvm 3.1, but I'm trying to figure out the steps that will work for llvm 3.2. I assume that I should be able to make the LLVM 3.1 version work correctly using clang instead of llvm-ld)
Then if I run:
llvm-ld test.bc2 -o a.out -native
everything is fine and a.out prints out 5.
However, if I run:
clang++ test.bc2 -o a.out
Then I get the error message:
test.bc2: file not recognized: File format not recognized clang-3:
error: linker command failed with exit code 1 (use -v to see invocation)
Obviously I know that I can produce an executable file by running clang directly on the .cpp files. But I'm wondering what the best way to integrate clang with opt is.
The test case described in the question can be compiled using the following steps:
clang++ -emit-llvm -c test1.cpp test2.cpp
llvm-link -o test.bc1 test1.o test2.o
opt test.bc1 -o test.bc2 -std-compile-opts
llc -filetype=obj test.bc2 -o test.o
clang++ test.o
This produces a working a.out file.
It seems that llc is needed to convert from bitcode to machine code which can then be processed by clang as it normally would.
In general I've found that
llvm-ld x.bc y.bc
can be replaced with
llc x.bc
llc y.bc
clang x.s y.s

"thread-local storage not supported for this target", suitable #ifdef?

Since every compiler has its own version of thread local storage, I ended up creating a macro for it. The only problem now is GCC (with pthreads turned off), which gives me:
"thread-local storage not supported for this target"
Fair enough, given that pthreads are actually turned off in this case. The question is, is there a generic way of detecting this using some macro e.g. #ifdef __GCC_XXX_NO_THREADS_XXX ?
EDIT: See the accepted answer below. Also, here's my lazy solution:
$ touch test.c
$ gcc -E -dM test.c > out.1
$ gcc -pthread -E -dM test.c > out.2
$ diff out.*
28a29
> #define _REENTRANT 1
This is on Mac OS X. I am not sure if it's portable or anything...
Your compile command line either has -lpthread or not: You could include a -DHAVE_PTHREADS there as well.
If you really want GCC/ELF specific runtime dectection, you could resort to weak refs:
#include <pthread.h>
extern void *pthread_getspecific(pthread_key_t key) __attribute__ ((weak));
int
main()
{
if (pthread_getspecific)
printf("have pthreads\n");
else
printf("no pthreads\n");
}
Here's what it looks like:
$ gcc -o x x.c
$ ./x
no pthreads
$ gcc -o x x.c -lpthread
$ ./x
have pthreads
If you use autoconf for your project you might find ax_tls.m4 useful.