Create MPI reduction operator from c++ lambda function - c++

I would like to write a function that wraps MPI_Allreduce, and which accepts any binary operator (as std::reduce) to be used as reduction operator by MPI. Especially, the user of such a function could use a lambda.
The following simple sample code illustrates that:
#include <mpi.h>
#include <iostream>
#include <functional>
template<typename BinaryOp>
void reduce(double *data, int len, BinaryOp op) {
auto lambda=[op](void *a, void *b, int *len, MPI_Datatype *){
double *aa=static_cast<double *>(a);
double *bb=static_cast<double *>(bb);
for (int i=0; i<*len; ++i) {
bb[i]=op(aa[i], bb[i]);
}
};
// MPI_User_function is a typedef to: void (MPI_User_function) ( void * a, void * b, int * len, MPI_Datatype * )
MPI_User_function *opPtr=/* black magic code that get the function pointer from the lambda */;
MPI_Op mpiOp;
MPI_Op_create(*opPtr, 1, &mpiOp);
MPI_Allreduce(MPI_IN_PLACE, data, len, MPI_DOUBLE, mpiOp, MPI_COMM_WORLD);
MPI_Op_free(&mpiOp);
}
int main() {
MPI_Init(nullptr, nullptr);
double data[4]={1.,2.,3.,4.};
reduce(data, 4, [](double a, double b){return a+b;});
int pRank;
MPI_Comm_rank(MPI_COMM_WORLD, &pRank);
if (pRank==0) {
for (int i=0; i<4; ++i) {
std::cout << data[i] << " ";
}
std::cout << std::endl;
}
MPI_Finalize();
return 1;
}
The missing part is the code that get a function pointer from the lambda in the reduce function. From several related questions, this problem of getting a function pointer from a capturing lambda seems to be tricky but possible to solve. But I failed to have something working on this simple code (I tried some tricks with std::function, std::bind, storage of the lambda in a static variable)... So a little help would be great!
EDIT: Following #noma answer, I tried the following simplified code without MPI in goldbolt
#include <iostream>
#include <functional>
typedef double MPI_Datatype;
template<typename BinaryOp, BinaryOp op> // older standards
void non_lambda(void *a, void *b, int *len, MPI_Datatype *)
{}
template<typename BinaryOp>
void reduce(double *data, int len, BinaryOp op) {
typedef void (MPI_User_function) ( void * a, void * b, int * len, MPI_Datatype * );
MPI_User_function *opPtr = &non_lambda<decltype(+op), +op>; // older standards;
}
int main() {
double data[4]={1.,2.,3.,4.};
reduce(data, 4, [](double a, double b){return a+b;});
return 1;
}
It compile on some compilers. Here are the results:
icc >= 19.0.1 (with -std=c++17) : OK
clang++ >= 5.0.0 (with --std=c++17): OK
clang++ 10.0.0 (with --std=c++14): NOK
g++ 9.3 (with --std=c++17): NOK
icc >= 19.0.0 (with -std=c++17) : NOK
The error message with icc 19.0.0 with -std=c++17 (or icc 19.0.1 with -std=c++14) is interesting:
<source>(15): error: expression must have a constant value
MPI_User_function *opPtr = &non_lambda<decltype(+op), +op>; // older standards;
^
detected during instantiation of "void reduce(double *, int, BinaryOp) [with BinaryOp=lambda [](double, double)->double]" at line 21
And indeed, I don't really understand the passing of the 'op' variable which is a runtime argument of the function reduce as the second template parameter of the non_lambda function... Is it an obscure c++17 functionality that only some of the compilers support?

I think the lambda approach is not possible here as it is a capturing lambda, see
https://stackoverflow.com/a/28746827/7678171
We can use a function template with the BinaryOp as a template value parameter instead of a Lambda here. This assumes that the BinaryOp is either a function pointer, or a capture-less lambda, that can be converted into one. Instead of the lambda inside your reduce we introduce:
template<auto op> // this is C++17, so use --std=c++17
// template<typename BinaryOp, BinaryOp op> // older standards
void non_lambda(void *a, void *b, int *len, MPI_Datatype *)
{
double *aa=static_cast<double *>(a);
double *bb=static_cast<double *>(bb);
for (int i=0; i<*len; ++i) {
bb[i]=op(aa[i], bb[i]);
}
}
The Black Magic line then is:
/* black magic code that get the function pointer from the lambda */
MPI_User_function *opPtr = &non_lambda<+op>; // NOTE: the + implies the lamda to function pointer conversion here
// MPI_User_function *opPtr = &non_lambda<decltype(+op), +op>; // older standards;
Hope this helps.
NOTE: I got this compiled using Clang 6.0, but g++ 7.5 failed (possible compiler bug?):
error: no matches converting function ‘non_lambda’ to type ‘void (*)(void*, void*, int*, struct ompi_datatype_t**)’
MPI_User_function *opPtr = &non_lambda<+op>;
^~~~~
note: candidate is: template<auto op> void non_lambda(void*, void*, int*, ompi_datatype_t**)
void non_lambda(void *a, void *b, int *len, MPI_Datatype *)
Maybe newer g++ versions work.

Related

How to fix "error: call to 'abs' is ambiguous"

I'm running a simple C++ program from HackerRank about pointers and it works fine on the website. However,
when I run it on MacOS, I get error: call to 'abs' is ambiguous and I'm not sure exactly what is ambiguous.
I've looked at other answers to similar issues, but the error message tends to be Ambiguous overload call to abs(double), which is not the issue I'm having, since I haven't used any doubles. I've also tried including the header files cmath and math.h, but the problem persists.
#include <stdio.h>
#include <cmath>
void update(int *a,int *b) {
int num1 = *a;
int num2 = *b;
*a = num1 + num2;
*b = abs(num1 - num2);
}
int main() {
int a, b;
int *pa = &a, *pb = &b;
scanf("%d %d", &a, &b);
update(pa, pb);
printf("%d\n%d", a, b);
return 0;
}
My issue occurs with line 8.
The full error message is:
$ clang++ test.cpp
test.cpp:8:10: error: call to 'abs' is ambiguous
*b = abs(num1 - num2);
^~~
.../include/c++/v1/math.h:769:1: note: candidate function
abs(float __lcpp_x) _NOEXCEPT {return ::fabsf(__lcpp_x);}
^
.../include/c++/v1/math.h:769:1: note: candidate function
abs(double __lcpp_x) _NOEXCEPT {return ::fabs(__lcpp_x);}
^
.../include/c++/v1/math.h:769:1: note: candidate function
abs(long double __lcpp_x) _NOEXCEPT {return ::fabsl(__lcpp_x);}
^
1 error generated.
The three overloads of abs that you have from <cmath> are abs(float), abs(double) and abs(long double); it's ambiguous because you have an int argument and the compiler doesn't know which floating-point type to convert to.
abs(int) is defined in <cstdlib>, so #include <cstdlib> will resolve your problem.
If you're using Xcode, you can get more details about the error in the Issues navigator (⌘5) and clicking the triangle next to your issue.
For me, #include <cstdlib> didn't solve the issue, maybe because I didn't have to include anything to use abs. So, in case it helps someone else, with explicit casting, it worked well for me like in the next code:
*b = abs(int(num1 - num2));
In templated code, it may be easily overlooked that std::abs is not defined for unsigned types. As an example, if the following method is instantiated for an unsigned type, the compiler may rightfully complain that std::abs is undefined:
template<typename T>
bool areClose(const T& left, const T& right) {
// This is bad because for unsigned T, std::abs is undefined
// and for integral T, we compare with a float instead of
// comparing for equality:
return (std::abs(left - right) < 1e-7);
}
int main() {
uint32_t vLeft = 17;
uint32_t vRight = 18;
std::cout << "Are the values close? " << areClose(vLeft, vRight) << std::endl;
}
A better definition of areClose() in above code, that would coincidentally also solve the problem of std::abs() being undefined, could look like this:
template<typename T>
bool areClose(const T& left, const T& right) {
// This is better: compare all integral values for equality:
if constexpr (std::is_integral<T>::value) {
return (left == right);
} else {
return (std::abs(left - right) < 1e-7);
}
}
if your using C compiler you should include
#include <stdlib.h>
and use abs without std::.
If you use C++ compiler then you should change abs to std::abs.
Hope it helps:)
I used #include <bits/stdc++.h> as the only include statement and it worked for me.
My code:
#include <bits/stdc++.h>
using namespace std;
class Solution {
public:
vector<int> findDuplicates(vector<int>& nums) {
int n = nums.size();
if(n == 0 || n == 1)
return {};
vector<int> ans;
for(int i = 0; i < n; i++)
{
if(nums[abs(nums[i])-1] < 0)
ans.push_back(abs(nums[i]));
else
nums[abs(nums[i])-1] = -1 * nums[abs(nums[i])-1];
}
return ans;
}
};

Variadic function int to size_t warning

I have a function accept multiple arguments.
#include <iostream>
#include <cstdarg>
#include <vector>
using namespace std;
template<typename... Values>
void doSomething(size_t input, Values... inputs)
{
size_t len = sizeof...(Values) + 1;
size_t vals[] = {input, inputs...};
vector<size_t> n(len);
std::copy( vals, vals+len, n.data() );
for(size_t i=0; i<len; i++) cout<<n[i]<<endl;
//Do something with n vector
}
It works fine when I call this function by:
size_t a(1), b(2), c(3);
doSomething(a,b,c);
However, it will have a problem when I call this function by:
doSomething(1,2,3);
It will give out warning message:
warning: narrowing conversion of ‘inputs#0’ from ‘int’ to ‘size_t {aka long unsigned int}’ inside { } [-Wnarrowing]
size_t vals[] = {inputs...};
I do not like this warning message, is there a way to solve this problem? I would like the function to be able to accept either size_t or int. Thank you.

Passing a function pointer and its parameters as a thrust::tuple to a global function

I want to do the following:
#include <thrust/tuple.h>
#include <tuple>
template<typename... Args>
void someFunction(void (*fp)(Args...), thrust::tuple<Args...> params) {
}
void otherFunction(int n) {
}
int main(int argc, char **argv) {
//// template argument deduction/substitution failed ////
someFunction<int>(&otherFunction, thrust::make_tuple(1));
return 0;
}
What I have tried:
Removing one of the two parameter leads to a working solution of course.
It works when I make someFunction a static function in a struct with template parameter. But in the original code someFunction is a CUDA kernel, so I can't do that. Any further ideas?
It works when I change thrust::tuple to std::tuple. Is there a way to construct a thrust::tuple out of a std::tuple?
EDIT:
To make it clearer: someFunction and otherFunction are __global__!
#include <thrust/tuple.h>
#include <tuple>
template<typename... Args>
__global__ void someFunction(void (*fp)(Args...), thrust::tuple<Args...> params) {
}
__global__ void otherFunction(int n) {
}
__constant__ void (*kfp)(int) = &otherFunction;
int testPassMain(int argc, char **argv) {
void (*h_kfp)(int);
cudaMemcpyFromSymbol(&h_kfp, kfp, sizeof(void *), 0, cudaMemcpyDeviceToHost);
someFunction<int><<<1,1>>>(h_kfp, thrust::make_tuple(1));
return 0;
}
I get a compiler error: template argument deduction/substitution failed in both examples.
Passing a function pointer and its parameters as a thrust::tuple to a global function
Something like this should be workable:
$ cat t1161.cu
#include <thrust/tuple.h>
#include <stdio.h>
template <typename T, typename T1>
__global__ void kernel(void (*fp)(T1), T params){ // "someFunction"
fp(thrust::get<0>(params));
fp(thrust::get<1>(params));
}
__device__ void df(int n){ // "otherFunction"
printf("parameter = %d\n", n);
}
__device__ void (*ddf)(int) = df;
int main(){
void (*hdf)(int);
thrust::tuple<int, int> my_tuple = thrust::make_tuple(1,2);
cudaMemcpyFromSymbol(&hdf, ddf, sizeof(void *));
kernel<<<1,1>>>(hdf, my_tuple);
cudaDeviceSynchronize();
}
$ nvcc -o t1161 t1161.cu
$ cuda-memcheck ./t1161
========= CUDA-MEMCHECK
parameter = 1
parameter = 2
========= ERROR SUMMARY: 0 errors
$
A similar methodology should also be workable if you intend df to be a __global__ function, you will just need to account properly for the dynamic parallelism case. Likewise, only a slight variation on above should allow you to pass the tuple directly to the child function (i.e. df, whether device function or kernel). It's not clear to me why you need variadic template arguments if your parameters are nicely packaged up in a thrust tuple.
EDIT: If you can pass your tuple to the child kernel (I don't see why you wouldn't be able to, since according to your updated example the tuple and the child kernel share the same variadic parameter pack), then you may still be able to avoid variadic templates using this approach:
$ cat t1162.cu
#include <thrust/tuple.h>
#include <stdio.h>
template<typename T>
__global__ void someFunction(void (*fp)(T), T params) {
fp<<<1,1>>>(params);
cudaDeviceSynchronize();
}
__global__ void otherFunction(thrust::tuple<int> t) {
printf("param 0 = %d\n", thrust::get<0>(t));
}
__global__ void otherFunction2(thrust::tuple<float, float> t) {
printf("param 1 = %f\n", thrust::get<1>(t));
}
__device__ void (*kfp)(thrust::tuple<int>) = &otherFunction;
__device__ void (*kfp2)(thrust::tuple<float, float>) = &otherFunction2;
int main(int argc, char **argv) {
void (*h_kfp)(thrust::tuple<int>);
void (*h_kfp2)(thrust::tuple<float, float>);
cudaMemcpyFromSymbol(&h_kfp, kfp, sizeof(void *), 0, cudaMemcpyDeviceToHost);
someFunction<<<1,1>>>(h_kfp, thrust::make_tuple(1));
cudaDeviceSynchronize();
cudaMemcpyFromSymbol(&h_kfp2, kfp2, sizeof(void *), 0, cudaMemcpyDeviceToHost);
someFunction<<<1,1>>>(h_kfp2, thrust::make_tuple(0.5f, 1.5f));
cudaDeviceSynchronize();
return 0;
}
$ nvcc -arch=sm_35 -rdc=true -o t1162 t1162.cu -lcudadevrt
$ CUDA_VISIBLE_DEVICES="1" cuda-memcheck ./t1162
========= CUDA-MEMCHECK
param 0 = 1
param 1 = 1.500000
========= ERROR SUMMARY: 0 errors
$
In terms of functionality (being able to dispatch multiple child kernels with varying parameter packs) I don't see any difference in capability, again assuming your parameters are nicely packaged in a tuple.
A quick and dirty solution is to cast the function pointer:
#include <thrust/tuple.h>
#include <tuple>
template<typename... Args>
__global__ void someFunction(void (*fp)(), thrust::tuple<Args...> params) {
void (*kfp)(Args...) = (void (*)(Args...)) fp;
kfp<<<1,1>>>(thrust::get<0>(params));
}
__global__ void otherFunction(int n) {
printf("n = %d\n", n);
}
__constant__ void (*kfp)(int) = &otherFunction;
int testPassMain(int argc, char **argv) {
void (*h_kfp)();
cudaMemcpyFromSymbol(&h_kfp, kfp, sizeof(void *), 0, cudaMemcpyDeviceToHost);
someFunction<int><<<1,1>>>(h_kfp, thrust::make_tuple(1));
return 0;
}
I'm open to nicer solutions!

How to pass the address of a template kernel function to a CUDA function?

I want to use CUDA runtime API functions accepting CUDA kernel function pointers with kernel templates.
I am able to do the following without templates:
__global__ myKernel()
{
...
}
void myFunc(const char* kernel_ptr)
{
...
// use API functions like
cudaFuncGetAttributes(&attrib, kernel_ptr);
...
}
int main()
{
myFunc(myKernel);
}
However the above does not work when the kernel is a template.
Another example:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
template<typename T>
__global__ void addKernel(T *c, const T *a, const T *b)
{
int i = threadIdx.x;
c[i] = a[i] + b[i];
}
int main()
{
cudaFuncAttributes attrib;
cudaError_t err;
//OK:
err = cudaFuncGetAttributes(&attrib, addKernel<float>); // works fine
printf("result: %s, reg1: %d\n", cudaGetErrorString(err), attrib.numRegs);
//NOT OK:
//try to get function ptr to pass as an argument:
const char* ptr = addKernel<float>; // compile error
err = cudaFuncGetAttributes(&attrib, ptr);
printf("result: %s, reg2: %d\n", cudaGetErrorString(err), attrib.numRegs);
}
The above results in a compile error:
error : no instance of function template "addKernel" matches the
required type
Edit:
The only workaround I've found so far is to put the stuff inside myFunc (see first code example) into a macro, which is ugly, but it requires no pointer argument passing and it works fine:
#define MY_FUNC(kernel) \
{ \
...\
cudaFuncGetAttributes( &attrib, kernel ); \
...\
}
Usage:
MY_FUNC( myKernel<float> )
Referring to your code contained in "another example:"
change this:
const char* ptr = addKernel<float>; // compile error
to this:
void (*ptr)(float *, const float *, const float *) = addKernel<float>;
And I believe it will compile and run correctly.
I don't know if it's useful or not in the overall scope of what you are trying to do.
EDIT responding to a question in the comments:
Once I have the pointer "extracted" from the function, I can then cast it to another type. Try it. For example, the following code also works:
void (*ptr)(float *, const float *, const float *) = addKernel<float>;
const char *ptr1 = (char *)ptr;
err = cudaFuncGetAttributes(&attrib, ptr1);
So, to answer your question, you can cast your function pointer to const char* if you want to, once you have your function pointer.
By the way, the code you posted as an answer throws compile errors for me on gcc 4.1.2 and gcc 4.4.6:
$ nvcc -arch=sm_20 -O3 -o t201 t201.cu
t201.cu: In function âint main()â:
t201.cu:25: error: address of overloaded function with no contextual type information
t201.cu:29: error: address of overloaded function with no contextual type information
$
And I get errors also if I remove the & in those two lines:
$ nvcc -arch=sm_20 -O3 -o t201 t201.cu
t201.cu: In function âint main()â:
t201.cu:25: error: insufficient contextual information to determine type
t201.cu:29: error: insufficient contextual information to determine type
$
So some of this may be compiler dependent, in terms of what steps are needed to get from point A to point B.
The type of addKernel<void> is not char *, it's a function type.
Instead, get the address of addKernel<float> like this:
typedef void (*fun_ptr)(float*,const float *, const float*);
fun_ptr ptr = addKernel<float>; // compile error
err = cudaFuncGetAttributes(&attrib, ptr);
Edit: added a templated version based on cuda runtime and the answer of Robert Crovella.
Here is a full working example using void function pointers and templates.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
template <typename T>
__global__ void addKernel(T *c, const T *a, const T *b)
{
int i = threadIdx.x;
c[i] = a[i] + b[i];
}
cudaError_t func1( cudaFuncAttributes* attrib, void (*ptr)() )
{
return cudaFuncGetAttributes(attrib, ptr);
}
cudaError_t func2( cudaFuncAttributes* attrib, const char* ptr )
{
return cudaFuncGetAttributes(attrib, ptr);
}
template <typename T>
cudaError_t func2( cudaFuncAttributes* attrib, T ptr )
{
return func2( attrib, (const char*) ptr);
}
int main()
{
cudaFuncAttributes attrib;
cudaError_t err;
void (*ptr2)() = (void(*)())(addKernel<float>); // OK on Visual Studio
err = func1(&attrib, ptr2);
printf("result: %s, reg1: %d\n", cudaGetErrorString(err), attrib.numRegs);
err = func2(&attrib, addKernel<double> ); // OK nice and standard
printf("result: %s, reg2: %d\n", cudaGetErrorString(err), attrib.numRegs);
}

How do I store a function to a variable?

I think they are called functors? (it's been a while)
Basically, I want to store a pointer to a function in a variable, so I can specify what function I want to use from the command line.
all the functions return and take the same values.
unsigned int func_1 (unsigned int var1)
unsigned int func_2 (unsigned int var1)
function_pointer = either of the above?
so then I could call it by going: function_pointer(my_variable)?
EDIT:
as per #larsmans's suggestion, I've gotten this:
Config.h:
class Config
{
public:
unsigned static int (*current_hash_function)(unsigned int);
};
Config.cpp:
#include "Config.h"
#include "hashes.h"
unsigned static int (*current_hash_function)(unsigned int) = kennys_hash_16;
hashes.h:
unsigned int kennys_hash(unsigned int out);
unsigned int kennys_hash_16(unsigned int out);
hashes.cpp:
just implements the functions in the header
main.cpp:
#include "Config.h"
#include "hashes.h"
// in test_network:
unsigned int hashed = Config::current_hash_function(output_binary);
//in main():
else if (strcmp(argv[i], "-kennys_hash_16") == 0)
{
Config::current_hash_function = kennys_hash_16;
}
else if (strcmp(argv[i], "-kennys_hash_8") == 0)
{
Config::current_hash_function = kennys_hash;
}
the error I get:
g++ -o hPif src/main.o src/fann_utils.o src/hashes.o src/Config.o -lfann -L/usr/local/lib
Undefined symbols:
"Config::current_hash_function", referenced from:
test_network() in main.o // the place in the code I've selected to show
auto_test_network_with_random_data(unsigned int, unsigned int, unsigned int)in main.o
generate_data(unsigned int, unsigned int, unsigned int)in main.o
_main in main.o // the place in the code I've selected to show
_main in main.o // the place in the code I've selected to show
generate_train_file() in fann_utils.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
make: *** [hPif] Error 1
The simplest you can do is
unsigned int (*pFunc)(unsigned int) = func_1;
This is a bare function pointer, which cannot be used to point to anything other than a free function.
You can make it less painful if your compiler supports the C++0x auto keyword:
auto pFunc = func_1;
In any case, you can call the function with
unsigned int result = pFunc(100);
There are many other options that provide generality, for example:
You can use boost::function with any C++ compiler
With a compiler implementing features of C++0x you can use std::function
These can be used to point to any entity that can be invoked with the appropriate signature (it's actually objects that implement an operator() that are called functors).
Update (to address updated question)
Your immediate problem is that you attempt to use Config::current_hash_function (which you declare just fine) but fail to define it.
This defines a global static pointer to a function, unrelated to anything in class Config:
unsigned static int (*current_hash_function)(unsigned int) = kennys_hash_16;
This is what you need instead:
unsigned int (*Config::current_hash_function)(unsigned int) = kennys_hash_16;
From C++11 you can use std::function to store functions. To store function you use it as follsonig:
std::function<return type(parameter type(s))>
as an example here it is:
#include <functional>
#include <iostream>
int fact (int a) {
return a > 1 ? fact (a - 1) * n : 1;
}
int pow (int b, int p) {
return p > 1 ? pow (b, p - 1) * b : b;
}
int main (void) {
std::function<int(int)> factorial = fact;
std::function<int(int, int)> power = pow;
// usage
factorial (5);
power (2, 5);
}
No, these are called function pointers.
unsigned int (*fp)(unsigned int) = func_1;
You could also use function either from the c++0x or from boost.
That would be
boost::function<int(int)>
and then use bind to bind your function to this type.
Have a look here and here
Ok here would be a example. I hope that helps.
int MyFunc1(int i)
{
std::cout << "MyFunc1: " << i << std::endl;
return i;
}
int MyFunc2(int i)
{
std::cout << "MyFunc2: " << i << std::endl;
return i;
}
int main(int /*argc*/, char** /*argv*/)
{
typedef boost::function<int(int)> Function_t;
Function_t myFunc1 = boost::bind(&MyFunc1, _1);
Function_t myFunc2 = boost::bind(&MyFunc2, _1);
myFunc1(5);
myFunc2(6);
}
You can store a function in a variable in c++ in this way
auto function_name = [&](params){
statements
};
auto add = [&](int a,int b){
return a+b;
};
cout<<add(5,6);
typedef unsigned int (*PGNSI)(unsigned int);
PGNSI variable1 = func_1;
PGNSI variable2 = func_2;
unsigned int (* myFuncPointer)(unsigned int) = &func_1;
However, the syntax for function pointers is awful, so it's common to typedef them:
typedef unsigned int (* myFuncPointerType)(unsigned int);
myFuncPointerType fp = &func_1;
IF you have Boost installed, you can also check out Boost Function.