How to compile OpenCL Kernel file(.cl) to llvm IR file - c++

How do I obtain an LLVM-IR file(.ll) from OpenCL Kernel file with clang?
The solution in this link seems working with some files, but for the codes which contains OpenCL vector types such as uchar4, seems not working (emitting type errors).
Is there an easy way to do this or is it not possible to gain LLVM-IR form with clang?

At least on OS X, there's an LLVM-based offline compiler for OpenCL kernels, you can find it in the following location:
/System/Library/Frameworks/OpenCL.framework/Libraries/openclc
(It supports the --help command line argument showing you the possible options.)
I'm not aware of any published source code for openclc so I guess that means you can't use it on other platforms, but as far as I'm aware, there's no standardised binary format for OpenCL kernels anyway, so you couldn't achieve platform independence with it.

Related

How does the C++ compiler know which CPU architecture is being used

with reference to : http://www.cplusplus.com/articles/2v07M4Gy/
During the compilation phase,
This phase translates the program into a low level assembly level code. The compiler takes the preprocessed file ( without any directives) and generates an object file containing assembly level code. Now, the object file created is in the binary form. In the object file created, each line describes one low level machine level instruction.
Now, if I am correct then different CPU architectures works on different assembly languages/syntax.
My question is how does the compiler comes to know to which assembly language syntax the source code has to be changed? In other words, how does the C++ compiler know which CPU architecture is there in the machine it is working on ?
Is there any mapping used by assembler w.r.t the CPU architecture for generating assembly code for different CPU architectures?
N.S : I am beginner !!
Each compiler needs to be "ported" to the given system. For each system supported, a "compiler port" needs to be programmed by someone who knows the system in-depth.
WARNING : This is extremely simplified
In short, there are three main parts to a compiler :
"Front-end" : This part reads the language (in this case c++) and converts it to a sort of pseudo-code specific to the compiler. (An Abstract Syntactic Tree, or AST)
"Optimizer/Middle-end" : This part takes the AST and make a non-architecture-dependant optimized one.
"Back-end" : This part takes the AST, and converts it to binary executable code, specific to the architecture you want to compile your language on.
When you download a c++ compiler for your platform, you, in fact, download the c++ frontend with the linux-amd64 backend, for example.
This coding architecture is extremely helpful, because it allows to port the compiler for another architecture without rewriting the whole parsing/optimizing thing. It also allows someone to create another optimizer, or even another frontend supporting a whole different language, and, as long as it outputs a correct AST, it will be compatible with every single backend ever written for this compiler.
Simply put, the knowledge of the target system is coded into the compiler.
So you might have a C compiler that generates SPARC binaries, and a C compiler that generates VAX binaries. They both accept the same input language (as defined in the C standard), but produce different programs from it.
Often we just refer to "the compiler", meaning the one that will generate binaries for our current environment.
In modern times, the distinction has become less obvious with compiler collections such as GCC. Now the "different compilers" are often the same compiler program, just set up with different configurations (these are the "target description files").
Just to complete the answers given here:
The target architecture is indeed coded into the specific compiler instance you're using. This is important also for a process called "cross-compiling" - The process of compiling on a certain system an executable that would operate on another system/architecture.
Consider working on an embedded system-on-chip that uses a completely different instruction set than your own - You're working on an x86/64 Linux system, but need to compile a mobile app running on an ARM micro-processor, or some other type of assembly architecture.
It would be unreasonable to compile your code on the target system, which might be so limited in CPU and memory that it can't feasibly run a compiler - and so you can use a GCC (or any other compiler) port for that target architecture on your favorite system.
It's also quite critical to remember that the entire tool-chain is often compatible to the target system, for instance when shared libraries such as libc are getting in play - as the target OS could be a different release of Linux and would have different versions of common functions - In which case it's common to use tool-chains that contain all the necessary libraries and use something like chroot or mock to compile in the "target environment" from within your system.

What is the macro CV_OCL_RUN used for in OpenCV?

I was learning hog.cpp implemented in OpenCV, when encountered the macro CV_OCL_RUN and confused with it.
In hog.cpp where detectMultiScale() locates, you can find CV_OCL_RUN and a method called ocl_detectMultiScale() in it. Compared between detectMultiScale() and ocl_detectMultiScale(), not only their names but their implement are quite similar.
Here are my questions:
What is the macro CV_OCL_RUN used for? Does it for test or other purpose?
Since detectMultiScale() and ocl_detectMultiScale() are so similar in functionality, why the later is embedded in the former ? What ways are they called in?
Thanks in advance!
CV_OCL_RUN is for OpenCL code.
If your computer is not able to use OpenCL capabilities (no GPU or no OpenCL driver), regular code (CPU) is run. You can also switch between regular code or use OpenCL version in the code. If setUseOptimized() or setUseOpenCL() is set to false, regular code will be used.
You can find in the opencl directory the kernel code which will be run on the GPU device.
PS: OpenCL is not only for GPU.

What happens to sse commands on a chip that doesn't support them?

Lets say I write a function using _mm_fmadd_ss, as far as I know, most(if not all) AMD chips don't support that or have their own version.
What happens to the software when ran on one of these chips? Could I compile for multiple chips? How would the program choose between them?
At first I thought "oh I'll just do preprocessor #ifdef whatever" but then I realized that only the stuff that passes those conditions during preprocessing of the code makes it to the output.
A program will usually terminate or won't execute at all, if it contains invalid instructions. One way to write portable SIMD code is to use gcc's vector extensions, but you still need to set a valid target architecture. On the other hand, if you run your code in a virtual machine, that supports the instructions, the code may run just fine, even if the host CPU does not support the instructions. I regularly test NEON code on a x86 computer under QEMU, for example. Also, people have ported BOCHS onto android and run x86 code (+SSE) on ARM-powered mobile phones.

Generate binary code (shared library) from embedded LLVM in C++

I am working on a high performance system written in C++. The process needs to be able to understand some complex logic (rules) at runtime written in a simple language developed for this application. We have two options:
Interpret the logic - run a embedded interpreter and generate a dynamic function call, which when receives data, based on the interpreted logic works on the data
Compile the logic into a plugin.so dynamic shared file, use dlopen, dlsym to load the plugin and call logic function at runtime
Option 2 looks to be really attractive as it will be optimized machine code, would run much faster than embedded interpreter in the process.
The options I am exploring are:
write a compile method string compile( string logic, list & errors, list & warnings )
here input logic is a string containing logic coded in our custom language
it generates llvm ir, return value of the compile method returns ir string
write link method bool link(string ir, string filename, list & errors, list & warnings)
for the link method i searched llvm documentation but I have not been able to find out if there is a possibility to write such a method
If i am correct, LLVM IR is converted to LLVM Byte Code or Assembly code. Then either LLVM JIT is used to run in JIT mode or use GNU Assembler is used to generate native code.
Is it possible to find a function in LLVM which does that ? It would be much nicer if it is all done from within the code rather than using a system command from C++ to invoke "as" to generate the plugin.so file for my requirement.
Please let me know if you know of any ways i can generate a shared library native binary code from my process at runtime.
llc which is a llvm tool that does LLVM-IR to binary code translation. I think that is all you need.
Basically you can produce your LLVM IR the way you want and then call llc over your IR.
You can call it from the command line or you can go to the implementation of llc and find out how it works to do that in your own programs.
Here is a usefull link:
http://llvm.org/docs/CommandGuide/llc.html
I hope it helps.

LLVM what is it and how can i use it to cross platform compilations

I was reading here and there about llvm that can be used to ease the pain of cross platform compilations in c++ , i was trying to read the documents but i didn't understand how can i
use it in real life development problems can someone please explain me in simple words how can i use it ?
The key concept of LLVM is a low-level "intermediate" representation (IR) of your program.
This IR is at about the level of assembler code, but it contains more information to facilitate optimization.
The power of LLVM comes from its ability to defer compilation of this intermediate representation to a specific target machine until just before the code needs to run. A just-in-time (JIT) compilation approach can be used for an application to produce the code it needs just before it needs it.
In many cases, you have more information at the time the program is running that you do back at head office, so the program can be much optimized.
To get started, you could compile a C++ program to a single intermediate representation, then compile it to multiple platforms from that IR.
You can also try the Kaleidoscope demo, which walks you through creating a new language without having to actually write a compiler, just write the IR.
In performance-critical applications, the application can essentially write its own code that it needs to run, just before it needs to run it.
Why don't you go to the LLVM website and check out all the documentation there. They explain in great detail what LLVM is and how to use it. For example they have a Getting Started page.
LLVM is, as its name says a low level virtual machine which have code generator. If you want to compile to it, you can use either gcc front end or clang, which is c/c++ compiler for LLVM which is still work in progress.
It's important to note that a bunch of information about the target comes from the system header files that you use when compiling. LLVM does not defer resolving things like "size of pointer" or "byte layout" so if you compile with 64-bit headers for a little-endian platform, you cannot use that LLVM source code to target a 32-bit big-endian assembly output pater.
There is a good chapter in a book explaining everything nicely here: www.aosabook.org/en/llvm.html