need to convert C++ template to C99 code - templates

I am porting CUDA code to OpenCL - CUDA allows C++ constructs like templates while OpenCL is strictly C99. So, what is the most painless way of porting templatest to C?
I thought of using function pointers for the template parameters.

Before there were templates, there were preprocessor macros.
Search the web for "generic programming in C" for inspiration.

Here is the technique I used for conversion of some of CUDA algorithms from Modern GPU code to my GPGPU VexCL library (with OpenCL support).
Each template function in CUDA code is converted to two template functions in OpenCL host code. The first host function ('name' function) returns mangled name of the generated OpenCL function (so that functions with different template parameters have different names); the second host function ('source' function) returns the string representation of the generated OpenCL function source code. These functions are then used for generation of the main kernel code.
Take, for example, the CTAMergeSort CUDA function template. It gets converted to the two overloads of merge_sort function in VexCL code. I call the 'source' function in order to add the function definition to the OpenCL kernel source here and then use the 'name' function to add its call to the kernel here.
Note that the backend::source_generator in VexCL is used in order to generate either OpenCL or CUDA code transparently. In your case the code generation could be much simpler.
To make it all a bit more clear, here is the code that gets generated for the mergesort<256,11,int,float> template instance:
void mergesort_256_11_int_float
(
int count,
int tid,
int * thread_keys0,
local int * keys_shared0,
float * thread_vals0,
local float * vals_shared0
)
{
if(11 * tid < count) odd_even_transpose_sort_11_int_float(thread_keys0, thread_vals0);
thread_to_shared_11_int(thread_keys0, tid, keys_shared0);
block_sort_loop_256_11_int_float(tid, count, keys_shared0, thread_vals0, vals_shared0);
}

Take a look at Boost.Compute. It provides a C++, STL-like API for OpenCL.

Related

How to convert function insertion module pass to intrinsic to inline

PROBLEM:
I currently have a traditional module instrumentation pass that
inserts new function calls into a given IR according to some logic
(inserted functions are external from a small lib that is later linked
to given program). Running experiments, my overhead is from
the cost of executing a function call to the library function.
What I am trying to do:
I would like to inline these function bodies into the IR of
the given program to get rid of this bottleneck. I assume an intrinsic
would be a clean way of doing this, since an intrinsic function would
be expanded to its function body when being lowered to ASM (please
correct me if my understanding is incorrect here, this is my first
time working with intrinsics/LTO).
Current Status:
My original library call definition:
void register_my_mem(void *user_vaddr){
... C code ...
}
So far:
I have created a def in: llvm-project/llvm/include/llvm/IR/IntrinsicsX86.td
let TargetPrefix = "x86" in {
def int_x86_register_mem : GCCBuiltin<"__builtin_register_my_mem">,
Intrinsic<[], [llvm_anyint_ty], []>;
}
Added another def in:
otwm/llvm-project/clang/include/clang/Basic/BuiltinsX86.def
TARGET_BUILTIN(__builtin_register_my_mem, "vv*", "", "")
Added my library source (*.c, *.h) to the compiler-rt/lib/test_lib
and added to CMakeLists.txt
Replaced the function insertion with trying to insert the intrinsic
instead in: llvm/lib/Transforms/Instrumentation/myModulePass.cpp
WAS:
FunctionCallee sm_func =
curr_inst->getModule()->getOrInsertFunction("register_my_mem",
func_type);
ArrayRef<Value*> args = {
builder.CreatePointerCast(sm_arg_val, currType->getPointerTo())
};
builder.CreateCall(sm_func, args);
NEW:
Intrinsic::ID aREGISTER(Intrinsic::x86_register_my_mem);
Function *sm_func = Intrinsic::getDeclaration(currFunc->getParent(),
aREGISTER, func_type);
ArrayRef<Value*> args = {
builder.CreatePointerCast(sm_arg_val, currType->getPointerTo())
};
builder.CreateCall(sm_func, args);
Questions:
If my logic for inserting the intrinsic functions shouldnt be a
module pass, where do i put it?
Am I confusing LTO with intrinsics?
Do I put my library function definitions into the following files as mentioned in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114322.html as for example EmitRegisterMyMem()?
clang/lib/CodeGen/CodeGenFunction.cpp - define llvm::Instrinsic::ID
clang/lib/CodeGen/CodeGenFunction.h - declare llvm::Intrinsic::ID
My LLVM compiles, so it is semantically correct, but currently when
trying to insert this function call, LLVM segfaults saying "Not a valid type for function argument!"
I'm seeing multiple issues here.
Indeed, you're confusing LTO with intrinsics. Intrinsics are special "functions" that are either expanded into special instructions by a backend or lowered to library function calls. This is certainly not something you're going to achieve. You don't need an intrinsic at all, you'd just need to inline the function call in question: either by hands (from your module pass) or via LTO, indeed.
The particular error comes because you're declaring your intrinsic as receiving an integer argument (and this is how the declaration would look like), but:
asking the declaration of variadic intrinsic with invalid type (I'd assume your func_type is a non-integer type)
passing pointer argument
Hope this makes an issue clear.
See also: https://llvm.org/docs/LinkTimeOptimization.html
Thanks you for clearing up the issue #Anton Korobeynikov.
After reading your explanation, I also believe that I have to use LTO to accomplish what I am trying to do. I especially found this link very useful: https://llvm.org/docs/LinkTimeOptimization.html. It seems that I am now on a right path.

Is it possible to run piece of pure C++ code in GPU

I don't know OpenCL very much but I know C/C++ API requires programmer to provide OpenCL code as a string. But lately I discovered ArrayFire library that doesn't require string-code to invoke some calculations. I wondered how is it working (it is open source but the code is a bit confusing). Would it be possible to write parallel for with OpenCL backend that invokes any piece of compiled (x86 for example) code like following:
template <typename F>
void parallel_for(int starts, int ends, F task) //API
{ /*some OpenCL magic */ }
//...
parallel_for(0, 255, [&tab](int i){ tab[i] *= 0.7; } ); //using
PS: I know I am for 99% too optimistic
You cannot really call C++ Host code from the device using standard OpenCL.
You can use SYCL, the Khronos standard for single-source C++ programming. SYCL allows to compile C++ directly into device code without requiring the OpenCL strings. You can call any C++ function from inside a SYCL kernel (as long as the source code is available). SYCL.tech has more links and updated information.

Pass complex numbers to and from a DLL in LabVIEW

I am trying to interface this C++ code -- which implements functions necessary to calculate a Voigt line shape -- with LabVIEW (I'm currently running LV2009). I successfully compiled the code into a DLL, and I set up the Call Library Function Node to point to the DLL. However, the function expects a vector of type complex double and returns a vector of type complex double. Complex double is not, however, one of my choices for data type when setting up the function prototype.
Unfortunately, I do not speak C/C++, so I don't have any idea how I would go about modifying the code to get and return real doubles only. I have compiled the code into a MEX file to use with MATLAB, and pass complex numbers in and out with no problem, so I know the code works.
Is there a way to trick LabVIEW 2009 into passing complex numbers in and out of DLL functions? If not, will I gain this ability if I upgrade to the newest release? If not, is there a good basic guide to C++ that will teach me how to modify the function to accept and return the real and imaginary parts as separate vectors?
LabVIEW doesn't allow interfacing with C++ code, only C (or if it's C++, it has to have the extern "C" declaration and use Plain Old Types).
I see that your library has C wrappers, but they use the new C99 complex type, which LabVIEW doesn't understand.
However LabVIEW can pass complex data type to a function, to see how it's done open the example named "Call DLL.vi" and select complex data-type to see function prototype and VI. Your chance might be that the C99 complex has the same binary representation than the LabVIEW one. I didn't dig for the info, but it might be very possible.
If it's the case, go to church and be grateful to your Lord, and use the C wrapper to interface to it.
If it's not, find a tutorial about making a DLL for your compiler, it's not difficult, just takes time. The DLL will take two double for each complex, and make the appropriate call to the real function.

How to call indirectly a C function

Let's suppose I have the following function:
int func(int a, char* b, float c)
{
return 42;
}
I am curios if there is a possibility to call this function without:
explicitly calling it (func(1, "abc", 2.4))
creating a function pointer to it, and then calling it via the function pointer.
The function is written in C (or C++) and might be located either in a library (DLL on Windows) or somewhere compiled in the current application. For now let's assume there are no name mangling issues.
However, I know the following:
the name of the function.
the number and type of parameters as text based input (such as "int", "char*", "float").
its return type
I'm open to any suggestions, but I'm somewhat afraid of some lower level assembly hacks, since I'd like this to be as portable as possible.
I'd prefer a C solution, and I'd like to avoid boost::bind...
Edit - some clarifications ...
The one "calling" the "function" is a scripting language's compiled library (DLL). It loads the scripting language (source file) which has "bindings" to exteral "functions" (The ones I am trying to call) and when in the scripting language it encounters "call this external function" it tries to call that external function which might be in a DLL ... or the application which actually loaded the scripting language's DLL...
In order to be able to call functions with parameter types that are not clear at compiler time, I fear you won't come around said "lower level assembly hacks".
In cases where portability to architectures other than x86 or AMD64 isn't relevant, take a look at this wonderful library. It allows OS-unspecific ways of generating native bytecode at runtime and should be the easiest way to fulfil your wishes.
It's still beta, however I'm using it for a while now without encountering any problems.

Use cpu function in cuda

I would like to include a C++ function in a CUDA Kernel, but this function is written for CPU like this:
inline float random(int rangeMin,int rangeMax){
return rand(rangeMin,rangeMax);
}
Assume that the rand() function use either curand.h or Thrust cuda library.
I thought in use a Kernel function (with only one GPU thread) that would include this function as inline, so the cuda compiler would generate the binary for the GPU.
Is this possible? If so I would like to include another inlines functions written for the cpu in the CUDA kernel function.
Something like this:
-- InlineCpuFunction.h and InlineCpuFunction.cpp
-- CudaKernel.cuh and CudaKernel.cu (this one include the above header and uses it's function in the CUDA kernel)
If you need some more explanation (as this may look confusing) please ask me.
You can tag the functions you want to use on both the device and the host with both the __host__ __device__ decorators that way it's compiled for your cpu and gpu.