CUDA C++ Templating of Kernel Parameter

CUDA C++ Templating of Kernel Parameter - c++

I'm trying to templatize a CUDA kernel based on a boolean variable (as shown here: Should I unify two similar kernels with an 'if' statement, risking performance loss?), but I keep getting a compiler error that says my function is not a template. I think that I'm just missing something obvious so it's pretty frustrating.
The following does NOT work:
util.cuh
#include "kernels.cuh"
//Utility functions
kernels.cuh
#ifndef KERNELS
#define KERNELS
template<bool approx>
__global__ void kernel(...params...);
#endif
kernels.cu
template<bool approx>
__global__ void kernel(...params...)
{
if(approx)
{
//Approximate calculation
}
else
{
//Exact calculation
}
}
template __global__ void kernel<false>(...params...); //Error occurs here
main.cu
#include "kernels.cuh"
kernel<false><<<dimGrid,dimBlock>>>(...params...);
The following DOES work:
util.cuh
#include "kernels.cuh"
//Utility functions
kernels.cuh
#ifndef KERNELS
#define KERNELS
template<bool approx>
__global__ void kernel(...params...);
template<bool approx>
__global__ void kernel(...params...)
{
if(approx)
{
//Approximate calculation
}
else
{
//Exact calculation
}
}
#endif
main.cu
#include "kernels.cuh"
kernel<false><<<dimGrid,dimBlock>>>(...params...);
If I throw in the
template __global__ void kernel<false>(...params...);
line at the end of kernels.cuh it also works.
I get the following errors (both referring to the marked line above):
kernel is not a template
invalid explicit instantiation declaration
If it makes a difference I compile all of my .cu files in one line, like:
nvcc -O3 -arch=sm_21 -I. main.cu kernels.cu -o program

All explicit specialization declarations must be visible at the time of the template instantiation. Your explicit specialization declaration is visible only in the kernels.cu translation unit, but not in main.cu.
The following code is indeed working correctly (apart from adding a __global__ qualifier at the explicit instantiation instruction).
#include<cuda.h>
#include<cuda_runtime.h>
#include<stdio.h>
#include<conio.h>
template<bool approx>
__global__ void kernel()
{
if(approx)
{
printf("True branch\n");
}
else
{
printf("False branch\n");
}
}
template __global__ void kernel<false>();
int main(void) {
kernel<false><<<1,1>>>();
getch();
return 0;
}
EDIT
In C++, templated functions are not compiled until an explicit instantiation of the function is encountered. From this point of view, CUDA, which now fully supports templates, behaves exactly the same way as C++.
To make a concrete example, when the compiler finds something like
template<class T>
__global__ void kernel(...params...)
{
...
T a;
...
}
it just checks the function syntax, but produces no object code. So, if you would compile a file with a single templated function as above, you will have an "empty" object file. This is reasonable, since the compiler would not know which type assigning to a.
The compiler produces an object code only when it encounters an explicit instantiation of the function template. This is, at that moment, how compilation of templated functions work and this behavior introduces a restriction for multiple-file projects: the implementation (definition) of a templated function must be in the same file as its declaration. So, you cannot separate the interface contained in kernels.cuh in a header file separated from kernels.cu, which is the main reason why the first version of your code does not compile. Accordingly, you must include both interface and implementation in any file that uses the templates, namely, you must include in main.cu both, kernels.cuh and kernels.cu.
Since no code is generated without an explicit instantiation, compilers tolerate the inclusion more than once of the same template file with both declarations and definitions in a project without generating linkage errors.
There are several tutorials on using templates in C++. An Idiot's Guide to C++ Templates - Part 1, apart from the irritating title, will provide you with a step-by-step introduction to the topic.

Related

What is the proper way to define a templated class's member function when behavior is identical for template types?

Figuring if something wasn't broke, I'd break it, I decided to specialize a class I had so that it could be templated between float and double precision automagically.
I have the following [simplified] class declaration:
// Quatcam.h
#pragma once
#include <boost/math/quaternion.hpp>
#include <boost/numeric/ublas/matrix.hpp>
template<typename FloatType>
class QuaternionCamera
{
public:
QuaternionCamera();
void applyTranslation(boost::numeric::ublas::vector<FloatType> translationVector);
boost::numeric::ublas::matrix<FloatType> getTranslationMatrix();
protected:
boost::numeric::ublas::vector<FloatType> m_location;
boost::math::quaternion<FloatType> m_orientation;
};
I have defined the member functions in a .cpp file:
//Quatcam.cpp
#include "Quatcam.h"
using namespace boost::numeric::ublas;
template<typename FloatType>
QuaternionCamera<FloatType>::QuaternionCamera()
: m_location(3),
m_orientation(1,0,0,0)
{
m_location[0] = m_location[1] = m_location[2] = 0;
}
template<typename FloatType>
void QuaternionCamera<FloatType>::applyTranslation(boost::numeric::ublas::vector<FloatType> translationVector)
{
m_location += translationVector;
}
template<typename FloatType>
boost::numeric::ublas::matrix<FloatType> QuaternionCamera<FloatType>::getTranslationMatrix()
{
boost::numeric::ublas::matrix<FloatType> returnMatrix = boost::numeric::ublas::identity_matrix<FloatType>(4,4);
boost::numeric::ublas::vector<FloatType> invTrans = -m_location;
returnMatrix(3,0) = invTrans[0];
returnMatrix(3,1) = invTrans[1];
returnMatrix(3,2) = invTrans[2];
return returnMatrix;
}
This code by itself will happily compile into a .lib or .obj file, but attempting to use the class in situ results in linker errors. Here is my example main.cpp attempting to use the class:
#include "Quatcam.h"
#include <boost/numeric/ublas/io.hpp>
#include <iostream>
int main(int argc, char** argv)
{
QuaternionCamera<float> qcam;
boost::numeric::ublas::vector<float> loc(3);
loc[0] = 0;
loc[1] = 5;
loc[2] = 0;
qcam.applyTranslation(loc);
boost::numeric::ublas::matrix<float> qtm = qcam.getTranslationMatrix();
std::cout << "qtm: "<< qtm << std::endl;
return 0;
}
This code fails to link with an error for missing symbols for getTranslationMatrix and applyTranslation. I assume this is because I haven't technically specified a full specialization of the functions for the type float.
Question(s)
Given that the behavior is the same for any atomic input type (float, double, even int, etc...) and only affects the precision of the answers.
Is there a way to force the compiler to emit specializations for all of them without having to;
move all of the function definitions into the header file, or;
explicitly create specializations for all data types that would presumably involve a lot of copypasta?

Recommended links
Why can templates only be implemented in the header file?
Why do C++ template definitions need to be in the header?
Recommended Practice
Instead of moving the definitions from the .cpp to the header, rename the .cpp to .tpp and add #include "Quatcam.tpp" at the end of Quatcam.h.
This is how you typically split up the template declarations, and their definitions, while still having the definitions available for instantiation.
Note: If you follow this road, you should not compile the .tpp by itself, as you were doing with the .cpp.
Explicit Instantiation
You can explicitly instantiate the templates in question in your .cpp to provide them for the linker, but that requires that you know the exact types that you'd require an instantation of.
This means that if you only explicitly instantiate QuaternionCamera<float>, you'd still get a linker error if main.cpp tries to use QuaternionCamera<double>.
There's no way of forcing instantiation of all "atomic input types", you'll have to write them all out explicitly.
template class QuaternionCamera<float>; // explicit instantiation
template class QuaternionCamera<double>; // etc, etc...

You should put these functions into the header file, not into the .cpp source.
The compiler only creates function instantiations after the template argument deduction is complete. The resulting object file will contain a compiled function for each type that the template was used with.
However, .cpp files are compiled separately. So, when you compile Quatcam.cpp, the compiler doesn't find any instantiations for this type, and doesn't create a function body. This is why you end up with a linker error.
To put it simply, this is how your header should look like:
template<typename T>
class Foo {
void Print();
T data;
};
// If template arguments are specified, function body goes to .cpp
template<>
void Foo<float>::Print();
// Template arguments are incomplete, function body should remain in the header
template<typename T>
void Foo<T>::Print() {
std::cout << data;
}
And this should to the .cpp source:
template<>
void Foo<float>::Print() {
std::cout << floor(data);
}

CUDA linker error with template class

Using CUDA 5.0 on ubuntu with gcc/g++ 4.6, I'm getting errors when linking against CUDA code with templates.
cu_array.cu:
#include "cu_array.hpp"
template<class T>
CuArray<T>::CuArray(unsigned int n) {
cudaMalloc(&data,n*sizeof(T));
}
cu_array.hpp:
#pragma once
template<class T>
class CuArray {
public:
CuArray(unsigned int n);
private:
T* data;
};
main.cu:
#include "cu_array.hpp"
int main() {
CuArray<float> a(10);
}
These compile fine with nvcc -c, but linking with nvcc cu_array.o main.o gives undefined reference to CuArray<float>::CuArray(unsigned int). If I move the contents of cu_array.cu into the header and only build the main, it uses the templates just fine. Or if I remove the templates altogether, the code naturally links fine.
I'm sure there's a simple answer for this. Any ideas?

You haven't instantiated the class in the compilation unit where it is defined, so the compiler doesn't emit any code for the class member function, and linkage fails. This isn't specific to CUDA, this greedy style of instantiation is the compilation/linkage model g++ uses, and lots of people get caught out by it.
As you have found already, the simplest solution is to include everything into the same compilation unit, and the problem disappears.
Otherwise if you explicitly instantiate CuArray::CuArray at the bottom of cu_array.cu like this:
template CuArray<float>::CuArray(unsigned int);
the compiler will emit code where it would otherwise not, and the linkage problem will be fixed. You will need to instantiate every class function for every type you want to use elsewhere in the code to make this approach work.

Multiple Template Instantiation

I'm designing a CUDA-C++ library with template classes. There are template functions my classes use, and they are invisible to main as well as the user. I need to specialize them explicitly because of the two steps of compiling to be performed, otherwise I'd get an "unresolved external" error when linking. Being this classes used in main.cpp, there's no way (I guess...) to tell nvcc what types are going to be used in tha main program, so I thought of using some macros to specialize them. Here's a simplified versione of the code:
//CUDA_functions.h
// CUDA functions declared here and included in files that will be compiled
// with g++. Those functions are implemented in .cu files, compiled with nvcc
template <typename T>
void foo1(T x);
template <typename T>
void foo2(T x);
template <typename T>
void foo3(T x);
//fileA.h - included in main.cpp
#include "CUDA_functions.h"
template <typename T>
class A {
// it uses foo1 & foo2 inside
}
//fileB.h - included in main.cpp
#include "CUDA_functions.h"
template <typename T>
class B {
// it uses foo1 & foo3 inside
}
//macros.h
#define _USE_CLASS_A(T) template void foo1(T); \
template void foo2(T); /**/
#define _USE_CLASS_B(T) template void foo1(T); \
template void foo3(T); /**/
//user_spec.cu - template specializations by user. This is the first file to be
// - compiled and it doesn't know what classes are going to be used
// say, user wants to use classes A & B: HERE THE ERROR RAISES!
#include "macros.h"
_USE_CLASS_A( int );
_USE_CLASS_B( int );
When I compile this code with Visual Studio, I get a warning about the double explicit instantiation (foo1), but when I compile it with g++ warning becomes an error!
I can't write macros like
#define _USE_FOO1(T) template void foo1(T) /**/
#define _USE_FOO2(T) template void foo2(T) /**/
#define _USE_FOO3(T) template void foo3(T) /**/
because the user doesn't have to worry about the existence of those functions and I'd like to specialize a list of them based on what class he/she is going to use. Last but not least, I found nothing about a "conditional specialization" of template. What can I do to solve? Thanks to everyone would be so nice to answer. Bye.

Is it for host code or device code? I believe CUDA does not support linking for device code. Linking template functions in host code has always been a bit fishy, CUDA or no CUDA.
Instead of having your hands dirty with macros -- how about putting them in a header, inside of namespace detail?
By convention, detail namespace indicates library internal stuff that you shouldn't ever access as a user.

Separate compiling with MinGW

Using this tutorial Makefile I want to build a simple program with a separate compiling, The main problem is that the IDE Eclpse Indigo C\C++ (prespective) or MinGW I cannot compile the files. The error which I get is :
undefined reference to double getAverage<int, 85u>(int (&) [85u])'
undefined reference to int getMax<int, 85u>(int (&) [85u])'
undefined reference to int getMin<int, 85u>(int (&) [85u])'
undefined reference to void print<int, 85u>(int (&) [85u])'
undefined reference to void sort<int, 85u>(int (&) [85u])'
undefined reference to void print<int, 85u>(int (&) [85u])'
The main.cpp file is this :
#include "Tools.h"
#include <iostream>
using namespace std;
int main(int argc,char* argv[])
{
int numbers[] = {1,-2,7,14,5,6,16,8,-2,7,14,5,6,16,8,-2,7,14,5,6,16,8,-2,7,14,5,6,16,8,-2,7,14,5,6,16,8,-2,7,14,5,6,16,8,-2,7,14,5,6,16,8,-2,7,14,5,6,16,8,-2,7,14,5,6,16,8,-2,7,14,5,6,16,8,-2,7,14,5,6,16,8,-2,7,14,5,6,16,8};
cout <<"Average = "<< getAverage(numbers) << endl;
cout <<"Max element = "<< getMax(numbers) << endl;
cout <<"Minimal element = "<< getMin(numbers) << endl;
print(numbers);
sort(numbers);
print(numbers);
return 0;
}
and I have a Tools.h file :
#ifndef TOOLS_H_
#define TOOLS_H_
#include <iostream>
int getBigger(int numberOne,int numberTwo);
template <typename T,size_t N> double getAverage(T (&numbers)[N]);
template <typename T,size_t N> T getMax(T (&numbers)[N]);
template <typename T,size_t N> T getMin(T (&numbers)[N]);
/**
* Implementing a simple sort method of big arrays
*/
template <typename T,size_t N> void sort(T (&numbers)[N]);
/**
* Implementing a method for printing arrays
*/
template <typename T,size_t N> void print(T (&numbers)[N]);
#endif

When you compile Tools.cpp your compiler has no idea about the template parameters that you have used in main.cpp. Therefore it compiles nothing related to this templates.
You need to include theses template definitions from the compilation unit that uses them. The file Tools.cpp is often renamed to something like Tools.inl to indicate that it's neither a header file nor a separate compilation unit.
The compilation unit "main.cpp" could look like this:
#include "tools.h"
#include "tools.inl"
main()
{
int number[] = {1,2,3};
getaverage(numbers);
}
Since the compiler identifies the required specialization it can generate the code from the implementation file.

For most cases, harper's answer is appropriate. But for completeness' sake, explicit template instantiation should also be mentioned.
When you include the implementation in every compilation unit, your template classes and functions will be instantiated and compiled in all of them. Sometimes, this is not desirable. It is mostly due to compile-time memory restrictions and compilation time, if your template classes and functions are very complicated. This becomes a very real issue when you, or the libraries you use rely heavily on template metaprogramming. Another situation could be that your template function implementations might be used in many compilation units, and when you change the implementation, you will be forced to re-compile all those compilation units.
So, the solution in these situations is to include a header file like your tools.h, and have a tools.cpp, implementing the templates. The catch is that, you should explicitly instantiate your templates for all the template arguments that will be used throughout your program. This is accomplished via adding the following to tools.cpp:
template double getAverage<int,85>(int (&numbers)[85]);
Note: You obviously have to do something about that "85", such as defining it in a header file and using it across tools.cpp and main.cpp

I've found this article which is useful : templates and header files
I declared the function in the Tools.h file and include there the file Tool.hpp and after this I defined them in the Tools.hpp file.

I haven't tried to compile .cpp and .c files together but maybe my example will help.
I had similar problem compiling two separate assembly files .s on mingw with standard gcc
compiler and i achieved it as follows:
gcc -m32 -o test test.s hello.s
-m32 means i'm compiling 32bit code
-o is the output file ( which in my example is the "test" file )
test.s and hello.s are my source files. test.s is the main file and hello.s has the helper function. (Oh, to mention is the fact that both files are in the same directory)

C++ template function compiles in header but not implementation

I'm trying to learn templates and I've run into this confounding error. I'm declaring some functions in a header file and I want to make a separate implementation file where the functions will be defined.
Here's the code that calls the header (dum.cpp):
#include <iostream>
#include <vector>
#include <string>
#include "dumper2.h"
int main() {
std::vector<int> v;
for (int i=0; i<10; i++) {
v.push_back(i);
}
test();
std::string s = ", ";
dumpVector(v,s);
}
Now, here's a working header file (dumper2.h):
#include <iostream>
#include <string>
#include <vector>
void test();
template <class T> void dumpVector(const std::vector<T>& v,std::string sep);
template <class T> void dumpVector(const std::vector<T>& v, std::string sep) {
typename std::vector<T>::iterator vi;
vi = v.cbegin();
std::cout << *vi;
vi++;
for (;vi<v.cend();vi++) {
std::cout << sep << *vi ;
}
std::cout << "\n";
return;
}
With implementation (dumper2.cpp):
#include <iostream>
#include "dumper2.h"
void test() {
std::cout << "!olleh dlrow\n";
}
The weird thing is that if I move the code that defines dumpVector from the .h to the .cpp file, I get the following error:
g++ -c dumper2.cpp -Wall -Wno-deprecated
g++ dum.cpp -o dum dumper2.o -Wall -Wno-deprecated
/tmp/ccKD2e3G.o: In function `main':
dum.cpp:(.text+0xce): undefined reference to `void dumpVector<int>(std::vector<int, std::allocator<int> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
collect2: ld returned 1 exit status
make: *** [dum] Error 1
So why does it work one way and not the other? Clearly the compiler can find test(), so why can't it find dumpVector?

The problem you're having is that the compiler doesn't know which versions of your template to instantiate. When you move the implementation of your function to x.cpp it is in a different translation unit from main.cpp, and main.cpp can't link to a particular instantiation because it doesn't exist in that context. This is a well-known issue with C++ templates. There are a few solutions:
1) Just put the definitions directly in the .h file, as you were doing before. This has pros & cons, including solving the problem (pro), possibly making the code less readable & on some compilers harder to debug (con) and maybe increasing code bloat (con).
2) Put the implementation in x.cpp, and #include "x.cpp" from within x.h. If this seems funky and wrong, just keep in mind that #include does nothing more than read the specified file and compile it as if that file were part of x.cpp In other words, this does exactly what solution #1 does above, but it keeps them in seperate physical files. When doing this kind of thing, it is critical that you not try to compile the #included file on it's own. For this reason, I usually give these kinds of files an hpp extension to distinguish them from h files and from cpp files.
File: dumper2.h
#include <iostream>
#include <string>
#include <vector>
void test();
template <class T> void dumpVector( std::vector<T> v,std::string sep);
#include "dumper2.hpp"
File: dumper2.hpp
template <class T> void dumpVector(std::vector<T> v, std::string sep) {
typename std::vector<T>::iterator vi;
vi = v.begin();
std::cout << *vi;
vi++;
for (;vi<v.end();vi++) {
std::cout << sep << *vi ;
}
std::cout << "\n";
return;
}
3) Since the problem is that a particular instantiation of dumpVector is not known to the translation unit that is trying to use it, you can force a specific instantiation of it in the same translation unit as where the template is defined. Simply by adding this: template void dumpVector<int>(std::vector<int> v, std::string sep); ... to the file where the template is defined. Doing this, you no longer have to #include the hpp file from within the h file:
File: dumper2.h
#include <iostream>
#include <string>
#include <vector>
void test();
template <class T> void dumpVector( std::vector<T> v,std::string sep);
File: dumper2.cpp
template <class T> void dumpVector(std::vector<T> v, std::string sep) {
typename std::vector<T>::iterator vi;
vi = v.begin();
std::cout << *vi;
vi++;
for (;vi<v.end();vi++) {
std::cout << sep << *vi ;
}
std::cout << "\n";
return;
}
template void dumpVector<int>(std::vector<int> v, std::string sep);
By the way, and as a total aside, your template function is taking a vector by-value. You may not want to do this, and pass it by reference or pointer or, better yet, pass iterators instead to avoid making a temporary & copying the whole vector.

This was what the export keyword was supposed to accomplish (i.e., by exporting the template, you'd be able to put it in a source file instead of a header. Unfortunately, only one compiler (Comeau) ever really implemented export completely.
As to why the other compilers (including gcc) didn't implement it, the reason is pretty simple: because export is extremely difficult to implement correctly. Code inside the template can change meaning (almost) completely, based on the type over which the template is instantiated, so you can't generate a conventional object file of the result of compiling the template. Just for example, x+y might compile to native code like mov eax, x/add eax, y when instantiated over an int, but compile to a function call if instantiated over something like std::string that overloads operator+.
To support separate compilation of templates, you have to do what's called two-phase name lookup (i.e., lookup the name both in the context of the template and in the context where the template is being instantiated). You typically also have the compiler compile the template to some sort of database format that can hold instantiations of the template over an arbitrary collection of types. You then add in a stage between compiling and linking (though it can be built into the linker, if desired) that checks the database and if it doesn't contain code for the template instantiated over all the necessary types, re-invokes the compiler to instantiate it over the necessary types.
Due to the extreme effort, lack of implementation, etc., the committee has voted to remove export from the next version of the C++ standard. Two other, rather different, proposals (modules and concepts) have been made that would each provide at least part of what export was intended to do, but in ways that are (at least hoped to be) more useful and reasonable to implement.

Template parameters are resolved as compile time.
The compiler finds the .h, finds a matching definition for dumpVector, and stores it. The compiling is finished for this .h. Then, it continues parsing files and compiling files. When it reads the dumpVector implementation in the .cpp, it's compiling a totally different unit. Nothing is trying to instantiate the template in dumper2.cpp, so the template code is simply skipped. The compiler won't try every possible type for the template, hoping there will be something useful later for the linker.
Then, at link time, no implementation of dumpVector for the type int has been compiled, so the linker won't find any. Hence why you're seeing this error.
The export keyword is designed to solve this problem, unfortunately few compilers support it. So keep your implementation with the same file as your definition.

A template function is not real function. The compiler turns a template function into a real function when it encounters a use of that function. So the entire template declaration has to be in scope it finds the call to DumpVector, otherwise it can't generate the real function.
Amazingly, a lot of C++ intro books get this wrong.

This is exactly how templates work in C++, you must put the implementation in the header.
When you declare/define a template function, the compiler can't magically know which specific types you may wish to use the template with, so it can't generate code to put into a .o file like it could with a normal function. Instead, it relies on generating a specific instantiation for a type when it sees the use of that instantiation.
So when the implementation is in the .C file, the compiler basically says "hey, there are no users of this template, don't generate any code". When the template is in the header, the compiler is able to see the use in main and actually generate the appropriate template code.

Most compilers don't allow you to put template function definitions in a separate source file, even though this is technically allowed by the standard.
See also:
http://www.parashift.com/c++-faq-lite/templates.html#faq-35.12
http://www.parashift.com/c++-faq-lite/templates.html#faq-35.14

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js